xoak
xoak copied to clipboard
Add k-d tree index
It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.
The second approach is also used by libpysal, see here.
Since we already rely on scikit-learn for the ball tree, would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?
would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?
Yes, definitely!
It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.
I'm not sure I understand why we should not also do the ball tree benchmark with lat/lon to spherical and the kd-tree with haversine. I'd go for a matrix like
tree:
- ball tree
- kd-tree
transform_metric:
- (deg2rad, haversine)
- (to sphere, euclidian)
(Sorry, hit send to early.)
I don't think we can use k-d tree with lat/lon haversine: https://news.ycombinator.com/item?id=9281998
The rest of the matrix is valid. Ball tree works with any metric that respects the triangle inequality.
Actually, there is already detailed comparison + benchmarks in Jake's blog post. But it'd still be interesting to see it applied to NEMO/FESOM spatial data patterns and the metrics/conversions used here.
Mmm now I'm confused about the validity of kd-tree + haversine. I'm wondering why haversine is not in the list of valid metrics for scikit-learn's KDTree.
We ~~should~~ might also consider the implementation in pykdtree.