xoak icon indicating copy to clipboard operation
xoak copied to clipboard

Add k-d tree index

Open benbovy opened this issue 4 years ago • 4 comments

It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.

The second approach is also used by libpysal, see here.

Since we already rely on scikit-learn for the ball tree, would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?

benbovy avatar Jun 29 '20 11:06 benbovy

would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?

Yes, definitely!

It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.

I'm not sure I understand why we should not also do the ball tree benchmark with lat/lon to spherical and the kd-tree with haversine. I'd go for a matrix like

tree:
  - ball tree
  - kd-tree
transform_metric:
  - (deg2rad, haversine)
  - (to sphere, euclidian)

(Sorry, hit send to early.)

willirath avatar Jun 29 '20 11:06 willirath

I don't think we can use k-d tree with lat/lon haversine: https://news.ycombinator.com/item?id=9281998

The rest of the matrix is valid. Ball tree works with any metric that respects the triangle inequality.

benbovy avatar Jun 29 '20 12:06 benbovy

Actually, there is already detailed comparison + benchmarks in Jake's blog post. But it'd still be interesting to see it applied to NEMO/FESOM spatial data patterns and the metrics/conversions used here.

benbovy avatar Jun 29 '20 12:06 benbovy

Mmm now I'm confused about the validity of kd-tree + haversine. I'm wondering why haversine is not in the list of valid metrics for scikit-learn's KDTree.

We ~~should~~ might also consider the implementation in pykdtree.

benbovy avatar Jun 29 '20 12:06 benbovy