NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

Approximate Nearest Neighbors algorithms

Open Datseris opened this issue 7 years ago • 13 comments

Hello!

I am guessing @KristofferC already knows this, but there is a "new" approach to nearest neighbors called "approximate nearest neigbhors" for the kNN problem. For many applications these approaches are just as good as the "exact" methods. JuliaDynamics is very interested on this!

From the discussions that lead to #69 , I was told that the "state of the art"/"fastest" algorithm (faster than the one proposed in #69), is the one described here: https://arxiv.org/abs/1603.09320 that uses "Hierarchical Navigable Small World" graphs (hnsw). So that if I wanted to implement a faster kNN approach, I should do this one instead of the one in #69 .

In addition it is very interesting to share these benchmarks: https://github.com/erikbern/ann-benchmarks which are in python but they have a lot of different algortithms. This gives a nice perspective.

What is the consensus on having the hnsw method in this repo? Do we agree on that (because of the "approximate")? I think it fits, but a new repo in JuliaDynamics can also be made if putting it here is a problem.

By the way, there is already some code for this method: (C++, header only https://github.com/nmslib/hnsw )

@andyferris I am tagging you as well because you were interested in the discussion in #69 . @JonasIsensee was also mentioned in the proposal of this method.


p.s.: I intend to use some funding I got to get someone contribute a faster kNN method

Datseris avatar Sep 11 '18 09:09 Datseris

Makes sense to have it here in my opinion.

KristofferC avatar Sep 11 '18 12:09 KristofferC

Would be nice to run that benchmark here as well.

KristofferC avatar Sep 11 '18 13:09 KristofferC

They should surely be connected and share as much of the same interface as possible. However I really like NearestNeighbors.jl not having a binary dependency... Maybe a separate extension package would be better here?

JonasIsensee avatar Sep 21 '18 09:09 JonasIsensee

Hm, I didn't realize there was any thought of adding any binary dependencies to this repo.

KristofferC avatar Sep 21 '18 12:09 KristofferC

Oh, maybe it was just me. I figured, we would first try a wrapper for the existing C++ implementation. But if someone wants to reimplement the whole thing...

JonasIsensee avatar Sep 21 '18 13:09 JonasIsensee

A wrapper should definitely be in a separate package. A native version can live here.

KristofferC avatar Sep 21 '18 13:09 KristofferC

I can attest that nmslib is extremely fast - takes on the order of a few milliseconds to perform a search in millions of samples for a dimensionality >100. That's a couple of orders of magnitude faster than this implementation, which is fast only for a small number of dimensions.

zgornel avatar Oct 20 '18 09:10 zgornel

Thanks @Datseris for the links awesome :) Reffering to #69, the simplest (and maybe fastest) approach would be to implement a minimalistic ann directly in Julia ... I will look into this in the coming months if time allows.

zgornel avatar Oct 20 '18 09:10 zgornel

I am working on an implementation of the NN Descent algorithm from this paper: https://github.com/dillondaudert/NNDescent.jl . If it's something that people think is appropriate for NearestNeighbors.jl, I can look to merge it into this repo once I have a reasonably efficient implementation done.

dillondaudert avatar Oct 20 '18 17:10 dillondaudert

There seems to be many pieces of work happening in parallel; I think it is more fitting for a JuliaNeighbors to be created (hopefully with a better name, as this JuliaProperty org name is getting beyond boring).

Datseris avatar Oct 20 '18 18:10 Datseris

@zgornel
I've already invested some time into my own implementation of the hnsw algorithm in Julia. (Working, but further optimization required ) I plan to put it online within the next few weeks.

@Datseris : This naming convention may be boring but I'd say that they're at least quite telling.

JonasIsensee avatar Oct 21 '18 08:10 JonasIsensee

booooooooooooooooooooooooooring. Why don't we name it TheHoodOfJulia?

Datseris avatar Oct 21 '18 08:10 Datseris

@JonasIsensee great, I can help with testing and further optimization if needed.

zgornel avatar Oct 21 '18 08:10 zgornel