NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

add `knninrange` to uniformly sample k points in range within a given range

Open KristofferC opened this issue 1 month ago • 7 comments

Fixes https://github.com/KristofferC/NearestNeighbors.jl/issues/210.

@juliohm, mind trying this out?

KristofferC avatar Nov 12 '25 13:11 KristofferC

Tested locally and it works as expected. Thank you!

Looking forward to the next release.

juliohm avatar Nov 12 '25 22:11 juliohm

I just want to say that it won't save you much time, because it still checks all points in range; it is just that it only store a random subset of size k of all these points in that range. But that is maybe what you wanted anyway.

KristofferC avatar Nov 12 '25 22:11 KristofferC

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place. I am curious to benchmark this new function against our current implementation (combination of knn with inrange).

juliohm avatar Nov 12 '25 22:11 juliohm

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place.

Note that there is also an inrange! function that take an input array to mutate. The type of algorithm used here is typically for when the memory for storing the full output set is quite large and you cannot afford to store them all https://en.wikipedia.org/wiki/Reservoir_sampling:

The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.

So I am actually not sure how useful this is in practice vs just allocating an output idx vector an calling inrange! (unless you will get a huge number of points returned from it).

KristofferC avatar Nov 13 '25 11:11 KristofferC

You are probably right.

What is your recommendation for the fastest method to retrieve up to k points inside a range of radius r?

My current implementation looks like the following:

inds, dists = knn(tree, x, k, true)

inds[dists .≤ r]

juliohm avatar Nov 13 '25 11:11 juliohm

You won't get a uniformly random sample from the points inside that range with that code but I don't know if you care about that. For uniformity you could do something like:

  buffer = Int[]
  idxs = inrange!(buffer, tree, x, r)
  sample = rand(rng, idxs, min(k, length(idxs)))

(which would be just a version of knninrange! that allocates a bit more memory).

Also, in your code you allocate idx_dists = dists .≤ r and inds[idx_dists]. You could pass around a filter like (ind for (ind, dist) in zip(inds, dists) if dist ≤ r) to avoid allocating anything and just iterate over that.

KristofferC avatar Nov 13 '25 13:11 KristofferC

(which would be just a version of knninrange! that allocates a bit more memory).

Thank you. I believe knninrange! is what I need then.

juliohm avatar Nov 13 '25 13:11 juliohm