1.0 road map
This was my very first package and looking at this with a set of more experienced eyes there are quite a few changes I want to make to APIs etc. Some of those will be somewhat breaking but I think it is worth it.
- [ ] https://github.com/KristofferC/NearestNeighbors.jl/pull/167
- [ ] Rethink the
skipfunction. Right now it only takes a single index but it isn't really documented and it is unclear if this is the best API. - [ ] Remove
DataFreeTree: I don't see any use of this in JuliaHub and I am not really satisfied with its implementation. - [ ] Ensure all public API is documented, https://github.com/KristofferC/NearestNeighbors.jl/issues/147
- [x] Test benchmarks and have them run on CI
- [ ] Some public API for investigating the tree like
bounding_box,isleafetc? - [x] Consider multithreading tree construction
- [ ] Go through and triage open issues and PRs
- [ ] Remove the sorting kwargs, users can just sort themselves.
- [ ] Add
knn!to be able to pass in a vector to be updated. - [ ] Consider if "vectorized" functions of
knnandinrangeare needed or if user can just write the loop themselves if they want. Would remove the need for https://github.com/KristofferC/NearestNeighbors.jl/pull/167. - [x] Periodic versions of the trees?
- [x] Better docs?
More to come
CC some people that I know use this package: @juliohm, @fredrikekre
Wonder if there are any plans for adding multidimensional inrange search? Implementation is very short: https://github.com/JuliaAPlavin/FlexiJoins.jl/blob/00000000ca2fe5eace0b881caa8d2258ca6154ee/src/nearestneighbors.jl#L32-L52 https://github.com/KristofferC/NearestNeighbors.jl/pull/150
Also, what about supporting arbitrary types, at least for BallTree? Stuff like strings or even custom objects can be useful with distance searches. https://github.com/JuliaAPlavin/FlexiJoins.jl/issues/8
Wonder if there are any plans for adding multidimensional inrange search?
Sounds like a good idea!
Also, what about supporting arbitrary types, at least for BallTree?
I don't think that should be too hard, we would "just" have to relax some assumptions of what types are used for the data etc.
It would be nice to support a more general notion of objects as pointed out by @aplavin.
In the meantime, the V<:AbstractVector requirement could be dropped:
A vector of vectors with fixed dimensionality nd, i.e., data should be a Vector{V} where V is a subtype of AbstractVector with defined length(V). For example a Vector{V} where V = SVector{3, Float64} is ok because length(V) = 3 is defined.
We cannot build trees with tuples currently, even though they also have static length.
May I also request/suggest something? I don't know how complicated of a re-write this would require, but it would be amazing to be able to add points to the tree. This is a big pain point for me in Vecchia.jl, my package implement the scalable GP approximation that is clearly winning the popularity contest.
For that method (and I would guess many sparse preconditioner design problems), you have to pick nearest neighbors from previously enumerated points. As of now, the workflow for that looks like this:
pts = [...] # Vector{SVector{D,Float64}}
knn_past_indices = map(2:length(pts)) do j
tree = KDTree(pts[1:(j-1)])
knn(tree, pts[j])[1]
end
NearestNeighbors.jl is so fast that even this very redundant method does pretty well until length(pts) reaches, say, 5k. But it would really be incredible to be able to do something like this:
pts = [...] # Vector{SVector{D,Float64}}
tree = KDTree([pts[1]])
knn_past_indices = map(2:length(pts)) do j
ixs = knn(tree, pts[j])[1]
add_point!(tree, pts[j])
iszero(rem(j, 100)) && rebalance!(tree)
ixs
end
There is one package that implements adaptive trees (AdaptiveKDTrees.jl), but it doesn't look like it does any balancing and I have hit a lot of performance issues with it and am about to take it out of my code. Otherwise, I am not aware of any Julia package that offers dynamic and re-balancing K-D trees (although I'd love to be corrected!).
Also, I just feel like I should say: I know I am not entitled to any time from anybody in this package or otherwise, and I hope this note doesn't read as an entitled user asking for more---I absolutely love this package and am so grateful for the functionality it presently offers. Just floating the thought! I am happy to try and devote some of my own time to trying this, but in that case I would greatly appreciate pointers to papers and other relevant resources for understanding routines for efficient dynamic tree stuff.
EDIT: I see this is mentioned in #44, sorry!