NearestNeighbors.jl add `knninrange` to uniformly sample k points in range within a given range

Fixes https://github.com/KristofferC/NearestNeighbors.jl/issues/210.

@juliohm, mind trying this out?

Nov 12 '25 13:11 KristofferC

Tested locally and it works as expected. Thank you!

Looking forward to the next release.

Nov 12 '25 22:11 juliohm

I just want to say that it won't save you much time, because it still checks all points in range; it is just that it only store a random subset of size k of all these points in that range. But that is maybe what you wanted anyway.

Nov 12 '25 22:11 KristofferC

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place. I am curious to benchmark this new function against our current implementation (combination of knn with inrange).

Nov 12 '25 22:11 juliohm

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place.

Note that there is also an inrange! function that take an input array to mutate. The type of algorithm used here is typically for when the memory for storing the full output set is quite large and you cannot afford to store them all https://en.wikipedia.org/wiki/Reservoir_sampling:

The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.

So I am actually not sure how useful this is in practice vs just allocating an output idx vector an calling inrange! (unless you will get a huge number of points returned from it).

Nov 13 '25 11:11 KristofferC

You are probably right.

What is your recommendation for the fastest method to retrieve up to k points inside a range of radius r?

My current implementation looks like the following:

inds, dists = knn(tree, x, k, true)

inds[dists .≤ r]

Nov 13 '25 11:11 juliohm

You won't get a uniformly random sample from the points inside that range with that code but I don't know if you care about that. For uniformity you could do something like:

  buffer = Int[]
  idxs = inrange!(buffer, tree, x, r)
  sample = rand(rng, idxs, min(k, length(idxs)))

(which would be just a version of knninrange! that allocates a bit more memory).

Also, in your code you allocate idx_dists = dists .≤ r and inds[idx_dists]. You could pass around a filter like (ind for (ind, dist) in zip(inds, dists) if dist ≤ r) to avoid allocating anything and just iterate over that.

Nov 13 '25 13:11 KristofferC

(which would be just a version of knninrange! that allocates a bit more memory).

Thank you. I believe knninrange! is what I need then.

Nov 13 '25 13:11 juliohm