add `knninrange` to uniformly sample k points in range within a given range
Fixes https://github.com/KristofferC/NearestNeighbors.jl/issues/210.
@juliohm, mind trying this out?
Tested locally and it works as expected. Thank you!
Looking forward to the next release.
I just want to say that it won't save you much time, because it still checks all points in range; it is just that it only store a random subset of size k of all these points in that range. But that is maybe what you wanted anyway.
I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place. I am curious to benchmark this new function against our current implementation (combination of knn with inrange).
I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place.
Note that there is also an inrange! function that take an input array to mutate. The type of algorithm used here is typically for when the memory for storing the full output set is quite large and you cannot afford to store them all https://en.wikipedia.org/wiki/Reservoir_sampling:
The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.
So I am actually not sure how useful this is in practice vs just allocating an output idx vector an calling inrange! (unless you will get a huge number of points returned from it).
You are probably right.
What is your recommendation for the fastest method to retrieve up to k points inside a range of radius r?
My current implementation looks like the following:
inds, dists = knn(tree, x, k, true)
inds[dists .≤ r]
You won't get a uniformly random sample from the points inside that range with that code but I don't know if you care about that. For uniformity you could do something like:
buffer = Int[]
idxs = inrange!(buffer, tree, x, r)
sample = rand(rng, idxs, min(k, length(idxs)))
(which would be just a version of knninrange! that allocates a bit more memory).
Also, in your code you allocate idx_dists = dists .≤ r and inds[idx_dists]. You could pass around a filter like (ind for (ind, dist) in zip(inds, dists) if dist ≤ r) to avoid allocating anything and just iterate over that.
(which would be just a version of
knninrange!that allocates a bit more memory).
Thank you. I believe knninrange! is what I need then.