kNN.jl
kNN.jl copied to clipboard
Use FLANN
FLANN (http://www.cs.ubc.ca/research/flann/) is one of the most widely used library for approximate nearest neighbor search.
It is fast & reliable, available in Linux distro & Homebrew, and has a C interface.
Yes, we should definitely use FLANN.
There are also a few other libraries we will want to look into at some point: http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neighbours-intro/
This post is actually more informative: http://radimrehurek.com/2013/12/performance-shootout-of-nearest-neighbours-contestants/
From this post, it appears to me that FLANN is the most reasonable choice at this point.
I would suggest having a separate package (say FLANN.jl) as a wrapper, and let this depend on it.
Yes, I think that's the right approach.
Using FLANN requires manual memory management, because it maintains in-memory index. How does it fit into a proposed workflow of creating a model and using it for multiple predictions? It would require either clear resources the at the end of model usage or recalculate indexes every time when searching.
Just treat the FLANN index like we treat other library that holds external resources (e.g. database connections).
We require the user to free the index when they have finished using it.