NearestNeighbors.jl
NearestNeighbors.jl copied to clipboard
In-place version of knn
Can I submit a PR to add a knn!
function that fills the first arguments instead of allocating and returning? I have to call knn
10^6 times in my applications, that would make a huge performance boost.
Also, do you have a better API than the following?
knn!(inds, dists, ...)
Maybe define a
struct KNNResult{T<:Number}
inds::Vector{Int}
dists::Vector{T}
end
and pass that instead of passing two arguments to fill?
knn!(result, ...) # in-place
r = knn(...) # ask users to unpack manually: r.inds, r.dists
Yeah, I've thought about this too.
Just to check, you know that you can query knn against multiple points in one call, which is faster than one at a time?
Yes, I know. It is just that I need to do it point by point in this case. In places where I can pass a bunch of points, I am already doing it. :+1:
Do you have some performance benchmarks? I think I tried this a while ago and while there was a speedup I don't recall it as "huge".
I don't have a benchmark yet, can try to run it locally and report here :+1:
seems like this issue was closed? I wonder what's the progress of in-place KNN, in particular in-place KDTree?
I have a problem that needs to do KNN (for density estimation) millions of times on matrices with the same size.
Ideally, I could create a KDTree
, then re-use the memory for new data, e.g.
kdtree = KDTree(data)
set!(kdtree, newdata) # in-place update of kdtree object with new data
Thanks.
I think this feature will be useful. Querying knn
against multiple points certainly helps, but if we do it $n$ times on $m$ querying points, it will allocate $n$ vectors of length $m$. When $n$ and $m$ are large, this will cause more frequency garbage collection.