NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

In-place version of knn

Open juliohm opened this issue 6 years ago • 6 comments

Can I submit a PR to add a knn! function that fills the first arguments instead of allocating and returning? I have to call knn 10^6 times in my applications, that would make a huge performance boost.

Also, do you have a better API than the following?

knn!(inds, dists, ...)

Maybe define a

struct KNNResult{T<:Number}
  inds::Vector{Int}
  dists::Vector{T}
end

and pass that instead of passing two arguments to fill?

knn!(result, ...) # in-place
r = knn(...) # ask users to unpack manually: r.inds, r.dists

juliohm avatar Jan 11 '19 12:01 juliohm

Yeah, I've thought about this too.

Just to check, you know that you can query knn against multiple points in one call, which is faster than one at a time?

KristofferC avatar Jan 11 '19 14:01 KristofferC

Yes, I know. It is just that I need to do it point by point in this case. In places where I can pass a bunch of points, I am already doing it. :+1:

juliohm avatar Jan 11 '19 19:01 juliohm

Do you have some performance benchmarks? I think I tried this a while ago and while there was a speedup I don't recall it as "huge".

KristofferC avatar Jan 11 '19 23:01 KristofferC

I don't have a benchmark yet, can try to run it locally and report here :+1:

juliohm avatar Jan 12 '19 00:01 juliohm

seems like this issue was closed? I wonder what's the progress of in-place KNN, in particular in-place KDTree?

I have a problem that needs to do KNN (for density estimation) millions of times on matrices with the same size. Ideally, I could create a KDTree, then re-use the memory for new data, e.g.

kdtree = KDTree(data)

set!(kdtree, newdata)    # in-place update of kdtree object with new data

Thanks.

githubtomtom avatar Dec 25 '20 12:12 githubtomtom

I think this feature will be useful. Querying knn against multiple points certainly helps, but if we do it $n$ times on $m$ querying points, it will allocate $n$ vectors of length $m$. When $n$ and $m$ are large, this will cause more frequency garbage collection.

wsshin avatar Sep 30 '22 19:09 wsshin