benchmarks Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN

Benchmarking scripts for Mlpack, Shogun & Matlab's LMNN.

Jun 15 '18 04:06 manish7294

@iglesias @manish7294 has some question about LMNN implementation in shogun :)

Jun 18 '18 14:06 vigsterkr

@iglesias Hi! Hope you are doing good :) I was doing benchmarking of LMNN on both Mlpack and Shogun. I am using shogun's KNN for measuring the accuracy metric on transformed data. But here comes the problem, if I used query set same as the training set then the KNN consider the query point itself as the first nearest neighbor(giving 100% accuracy for k =1) but it's not same as what Mlpack's KNN do, Mlpack's KNN avoid the situation of same point being the nearest neighbor and hence in this case Shogun's KNN second nearest neighbor is Mlpack's KNN first nearest neighbor. So, can you think of a way so that both the libraries are on the same page for benchmarking purpose? Or, Is there any way to avoid the point itself being consider as first nearest neighbor in Shogun. Any kind of help will be appreciated :)

Jun 18 '18 15:06 manish7294

Hi @manish7294. So this is about KNN.

What about the following: in mlpack use KNN with k=k' and in Shogun KNN with k=k'+1. Then you could compare the nearest neighbors given by mlpack with the 2nd to (k+1)th nearest neighbors given by Shogun.

Does this make sense to you?

Jun 19 '18 09:06 iglesias

@iglesias Thanks for looking into this. We thought about that but it would create problem in calculating predictions. As in case of Shogun's KNN prediciton, the point itself will always be counted as the positive nearest neighbor and will definitely give some amount of error in the measuring accuracy. Let me take an example: Let's say have 2 classes we want to have k = 2, so here we will be taking Mlpack's k as k = 2 and Shogun's k as k = 3. And suppose we got the first neighbor as of class 1 and second of class 2. So, ideally(as per what we do in Mlpack's KNN) we should be predicting point as of class 1. But here earlier we took Shogun's k as k as k = 3 and suppose in this case the point itself has label 2 as it will always be counted in the prediction), giving predicition as label 2 which would be different from Mlpack's KNN. On a second thought, If I assume that Shogun has a distance weighted KNN(Not sure) running behind the scenes, then the output will always be the labels of the point itself, no matter what k we choose.

Jun 19 '18 11:06 manish7294

Would it be possible to just get the k+1 nearest neighbors from Shogun itself? If that is the case, then we can do the classification ourselves (and ensure that we are always doing it the same way). If we can't get those from Shogun, maybe it is better to use a different technique for the nearest neighbor search. I don't think it makes such a huge difference either way, since we aren't timing the nearest neighbor search, we are just using it for finding the true nearest neighbors to calculate a metric.

Jun 19 '18 13:06 rcurtin

Thanks all for helping out. Hopefully, the new custom KNN accuracy predictor will work as expected :)

Jun 19 '18 16:06 manish7294

No problem then, I will add it for sure :)

Jun 19 '18 19:06 manish7294

Everything looks good to me, but the build will fail until LMNN is part of the newest release of mlpack. So we can wait to merge until then, and we can be sure to release a newer version of mlpack soon-ish (before the end of the summer) with the LMNN support. For now, you can use the branch to run timing tests.

Jun 27 '18 19:06 rcurtin

Do you think it would be useful to include mlpack-master as another library, that would allow us to merge this earlier.

Jun 28 '18 17:06 zoq

@mlpack-jenkins test this please

Jul 09 '18 16:07 rcurtin

@mlpack-jenkins test this please

Jul 09 '18 18:07 rcurtin

@mlpack-jenkins test this please

Jul 09 '18 18:07 rcurtin

Can one of the admins verify this patch?

Jul 09 '18 18:07 mlpack-jenkins

benchmarks benchmarks copied to clipboard

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN

benchmarks
benchmarks copied to clipboard