Using distance based classifiers

First, we load the needed libraries.

library(datasets)
library(rDML)

We will use the iris dataset.

# Loading dataset
data(iris)
X = iris[1:4]
y = as.array(as.numeric(iris[5][,1]))

Let’s use the k-NN classifier with a DML algorithm.

nca = dml$NCA()
knn = distance_clf$kNN(n_neighbors = 7, dml_algorithm = nca)

Now, we fit the transformer and the predictor.

nca$fit(X,y)
knn$fit(X,y)

We can now predict the labels for the k-NN with the learned distance,

# The predictions for the training set. They are made leaving the sample to predict out.
knn$predict()

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [71] 3 2 3 2 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
## [106] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3

# We can also make predictions for new collected data.
X_ = matrix(nrow = 3, ncol = 4, data = c(1,0,0,0,
                                         1,1,0,0,
                                         1,1,1,0))
knn$predict(X_)

## [1] 1 1 1

and see the classification scores.

# Score for the training set (leaving one out)
knn$score()

## [1] 0.9733333

# Scoring test data
y_ = as.numeric(c(1,2,2))
knn$score(X_,y_)

## [1] 0.3333333

Another interesting classifier is NCMC. With this classifier we can make predicitions by choosing the class who has a centroid the nearest. The centroids number can be set for each class and are calculated using k-Means over each class subdataset. We can use it in the same way as the previous classifier.

ncmc = distance_clf$NCMC_Classifier(centroids_num = c(2,3,4))
ncmc$fit(X,y)

ncmc$score(X,y)

## [1] 0.9733333

If we want to use this classifier with the learned distance, we may transform the data first.

Lx = nca$transform()
ncmc = distance_clf$NCMC_Classifier(centroids_num = c(2,3,4))
ncmc$fit(Lx,y)

ncmc$score(Lx,y)

## [1] 0.9866667

Juan Luis Suárez Díaz