NearestNeighbors.jl icon indicating copy to clipboard operation
NearestNeighbors.jl copied to clipboard

1.0 road map

Open KristofferC opened this issue 1 year ago • 7 comments

This was my very first package and looking at this with a set of more experienced eyes there are quite a few changes I want to make to APIs etc. Some of those will be somewhat breaking but I think it is worth it.

  • [ ] https://github.com/KristofferC/NearestNeighbors.jl/pull/167
  • [ ] Rethink the skip function. Right now it only takes a single index but it isn't really documented and it is unclear if this is the best API.
  • [ ] Remove DataFreeTree: I don't see any use of this in JuliaHub and I am not really satisfied with its implementation.
  • [ ] Ensure all public API is documented, https://github.com/KristofferC/NearestNeighbors.jl/issues/147
  • [x] Test benchmarks and have them run on CI
  • [ ] Some public API for investigating the tree like bounding_box, isleaf etc?
  • [x] Consider multithreading tree construction
  • [ ] Go through and triage open issues and PRs
  • [ ] Remove the sorting kwargs, users can just sort themselves.
  • [ ] Add knn! to be able to pass in a vector to be updated.
  • [ ] Consider if "vectorized" functions of knn and inrange are needed or if user can just write the loop themselves if they want. Would remove the need for https://github.com/KristofferC/NearestNeighbors.jl/pull/167.
  • [x] Periodic versions of the trees?
  • [x] Better docs?

More to come

KristofferC avatar Jun 17 '24 10:06 KristofferC

CC some people that I know use this package: @juliohm, @fredrikekre

KristofferC avatar Jun 17 '24 10:06 KristofferC

Wonder if there are any plans for adding multidimensional inrange search? Implementation is very short: https://github.com/JuliaAPlavin/FlexiJoins.jl/blob/00000000ca2fe5eace0b881caa8d2258ca6154ee/src/nearestneighbors.jl#L32-L52 https://github.com/KristofferC/NearestNeighbors.jl/pull/150

aplavin avatar Jul 08 '24 10:07 aplavin

Also, what about supporting arbitrary types, at least for BallTree? Stuff like strings or even custom objects can be useful with distance searches. https://github.com/JuliaAPlavin/FlexiJoins.jl/issues/8

aplavin avatar Jul 08 '24 10:07 aplavin

Wonder if there are any plans for adding multidimensional inrange search?

Sounds like a good idea!

KristofferC avatar Jul 08 '24 11:07 KristofferC

Also, what about supporting arbitrary types, at least for BallTree?

I don't think that should be too hard, we would "just" have to relax some assumptions of what types are used for the data etc.

KristofferC avatar Jul 08 '24 11:07 KristofferC

It would be nice to support a more general notion of objects as pointed out by @aplavin.

In the meantime, the V<:AbstractVector requirement could be dropped:

A vector of vectors with fixed dimensionality nd, i.e., data should be a Vector{V} where V is a subtype of AbstractVector with defined length(V). For example a Vector{V} where V = SVector{3, Float64} is ok because length(V) = 3 is defined.

We cannot build trees with tuples currently, even though they also have static length.

juliohm avatar Apr 08 '25 14:04 juliohm

May I also request/suggest something? I don't know how complicated of a re-write this would require, but it would be amazing to be able to add points to the tree. This is a big pain point for me in Vecchia.jl, my package implement the scalable GP approximation that is clearly winning the popularity contest.

For that method (and I would guess many sparse preconditioner design problems), you have to pick nearest neighbors from previously enumerated points. As of now, the workflow for that looks like this:

pts   = [...] # Vector{SVector{D,Float64}}
knn_past_indices = map(2:length(pts)) do j
  tree = KDTree(pts[1:(j-1)])
  knn(tree, pts[j])[1]
end

NearestNeighbors.jl is so fast that even this very redundant method does pretty well until length(pts) reaches, say, 5k. But it would really be incredible to be able to do something like this:

pts  = [...] # Vector{SVector{D,Float64}}
tree = KDTree([pts[1]])
knn_past_indices = map(2:length(pts)) do j
  ixs = knn(tree, pts[j])[1]
  add_point!(tree, pts[j])
  iszero(rem(j, 100)) && rebalance!(tree)
  ixs
end 

There is one package that implements adaptive trees (AdaptiveKDTrees.jl), but it doesn't look like it does any balancing and I have hit a lot of performance issues with it and am about to take it out of my code. Otherwise, I am not aware of any Julia package that offers dynamic and re-balancing K-D trees (although I'd love to be corrected!).

Also, I just feel like I should say: I know I am not entitled to any time from anybody in this package or otherwise, and I hope this note doesn't read as an entitled user asking for more---I absolutely love this package and am so grateful for the functionality it presently offers. Just floating the thought! I am happy to try and devote some of my own time to trying this, but in that case I would greatly appreciate pointers to papers and other relevant resources for understanding routines for efficient dynamic tree stuff.

EDIT: I see this is mentioned in #44, sorry!

cgeoga avatar May 13 '25 20:05 cgeoga