xoak icon indicating copy to clipboard operation
xoak copied to clipboard

PROPOSAL: pygeos & rtree engine for geometry based selection

Open snowman2 opened this issue 3 years ago • 5 comments

I am wondering if adding support for these R-Tree based selectors would be of interest:

  • https://pygeos.readthedocs.io/en/latest/strtree.html
  • https://rtree.readthedocs.io/en/latest/tutorial.html

It would allow for selecting the points with a geometry as input.

snowman2 avatar Jan 06 '21 20:01 snowman2

Closing for #12

snowman2 avatar Jan 06 '21 20:01 snowman2

I am wondering if adding support for these R-Tree based selectors would be of interest

Yes this would be nice additions! We could add adapters for those R-Tree indexes either in xoak or in 3rd party packages. Adding it in xoak would be a reasonable option for now (dependencies are optional for all adapters implemented in xoak). I'm not sure how the number of adapters would grow in the future, though.

It would allow for selecting the points with a geometry as input.

This would be great too. That would require some work, though, as we need to extend the API of the index adapters and the xoak xarray accessors in order to support this. Open questions, I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.

(I'm re-opening this issue as I think it's worth discussing geometry based selection here - the topic #12 is too broad)

benbovy avatar Jan 08 '21 16:01 benbovy

I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.

This stems from: https://github.com/corteva/rioxarray/issues/202

Currently in rioxarray you can clip the data using a list of geometries: https://corteva.github.io/rioxarray/stable/examples/clip_geom.html

However, this only works for regular grids (i.e. 1 dimensional x and y coordinates that are equally spaced)

Coordinates:
  * y            (y) float64 ...
  * x            (x) float64 ...

The idea of this would be to enable clipping with 2 dimensional coordinates (usually latitude and longitude) with irregular grids:

  * latitude            (latitude, longitude) float64 ...
  * longitude           (latitude, longitude) float64 ... 

With the pygeos.STRTree, you can query for the coordinates that intersect the input geometries and get the indexes back to generate a mask you could use to clip/select data.

>>> import pygeos
>>> tree = pygeos.STRtree(pygeos.creation,points(xds.longitude, xds.latitude))
>>> tree.query_bulk([pygeos.box(2, 2, 4, 4), pygeos.box(5, 5, 6, 6)]).tolist()

In this scenario, there isn't really a need/benefit to use geopandas.

snowman2 avatar Jan 08 '21 16:01 snowman2

I see, clipping irregular data with a list of geometries is a nice example that xoak should support directly or indirectly.

xoak currently exposes a .sel method that is similar to xarray's .sel. Currently only point-wise indexing (i.e., using xarray objects as indexers) is possible, but range or rectangular selection (using slices) like in the simple examples given in https://corteva.github.io/rioxarray/stable/examples/clip_geom.html should be supported later. xoak's xarray accessor API could be extended with extra methods for more advanced but still common operations like clip with compound or complex geometries (although it might be too specific to fall in xoak's scope).

In addition, there's ongoing discussion/work in #32 to expose xarray-friendly wrappers for the query methods of the adapted index objects. The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk, sklearn.neighbors.KDTree.query_radius, etc.) through a common API.

BTW, I started writing bindings for the s2geometry library here that may provide similar capabilities than pygeos.STRTree for spherical coordinates.

benbovy avatar Jan 11 '21 08:01 benbovy

The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk, sklearn.neighbors.KDTree.query_radius, etc.) through a common API.

By common API, I mean a small set of query methods, one for each kind of query, rather than a single query method for all kinds of queries. Still, for the same kind of query there might be subtle differences from one index object to another.

benbovy avatar Jan 11 '21 08:01 benbovy