xoak
xoak copied to clipboard
Checkout pyresample
As noted by @koldunovn: We should have a look at https://pyresample.readthedocs.io/en/latest/
pyresample
uses pykdtree under the hood, which looking at the benchmarks is faster than scipy.spatial.cKDTree
. It can support parallel queries (using openmp) but seems to only support euclidean distances.
is faster than scipy.spatial.cKDTree
Maybe not anymore? https://github.com/ESM-VFC/xoak/issues/12#issuecomment-672756670
Hi @willirath and @benbovy. I'm one of the core developers on pyresample (also @mraspaud and @pnuu). I also tried to start the geoxarray library which as far as purpose I hope to be an overlap of xoak, rioxarray (CC @snowman2), and pyresample (depending on and/or supplementing these libraries). Shortly after trying to start the project I realized that holding on to coordinate information in xarray objects/accessors was not as "clean" as I thought it was going to be and I basically gave up for a while. But I'd like to bring the project back up, if not in implementation then at least figure out the concepts that I was originally hoping to cover.
@raybellwaves has pointed xoak out to me a couple times now and I'm very impressed, but I'm really lacking in my understanding of indexers (xarray or pandas). I'm hoping you can clear some things up for me about xoak and the future you see. I've been trying to follow https://github.com/pydata/xarray/pull/4979 and https://github.com/pydata/xarray/pull/5102 and I'm wondering how this work will affect xoak? Would it make things like the xoak accessor's .sel
unnecessary? Is xarray interested in collecting a series of indexers like those in xoak? What about the indexer registry and custom indexers? If I want to make an indexer based on pyresample/pykdtree that xarray can use, would you recommend I look at adding it to xoak? Or add it to pyresample and have it register it with xoak? Or just keep it in pyresample (or geoxarray) and have interfaces for users to get to it? How does xoak's indexers perform with thousands of points (ex. 10000 x 6400 - large swath from the VIIRS satellite instrument)?
What I'd really like to get to eventually with geoxarray (or the geospatial xarray community) is indexers that allow for CRS-aware selection/indexing, consistent spatial resampling interfaces, and consistent CRS/projection/coordinate metadata handling and conversion. That's why I'm currently trying to figure out how I can take what I know from Satpy/Pyresample and rioxarray and merge it with the new information I'm gathering from xoak.
Side note: Regarding pykdtree performance, I'd like to see how these benchmarks were run. @mraspaud just recently ran some tests and found similar performance to pykdtree's original findings in its README. There is a chance that if pykdtree was installed in a certain way the OpenMP library wasn't included and used. Also, we've been using pykdtree underneath some dask map_blocks/delayed calls in the Satpy library for resampling and have been slowly moving the functionality to pyresample.
Hi @djhoese!
Hope to clarify things a bit about Xarray flexible indexes and Xoak here.
We started Xoak as a preliminary project before starting to work on flexible indexes in Xarray. At some point, some of Xoak's specific features will likely reuse Xarray API:
-
We plan to make Xoak indexes compatible with Xarray, i.e., eventually they will inherit from
xarray.Index
added in https://github.com/pydata/xarray/pull/5102. -
Xoak's index registering system will eventually be dropped in favor of Xarray's index registering system, e.g., based on entrypoints like the newly refactored Xarray I/O backend system.
-
Xoak accessor's
.sel
might become unnecessary at some point (it will depend on whether Xarray's.sel
will support Xoak's features like providing chunked indexers). However, Xoak will still provide an accessor so that we can query indexes for operations beyond simple data selection like, e.g., return the distances to the nearest neighbors, k-nearest neighbor selection, etc.
What I'd really like to get to eventually with geoxarray (or the geospatial xarray community) is indexers that allow for CRS-aware selection/indexing, consistent spatial resampling interfaces, and consistent CRS/projection/coordinate metadata handling and conversion.
That's definitely the goal with Xarray's flexible indexes refactoring. Once this is ready, you should be able to create your own index class (in geoxarray
or any package reused in the geospatial xarray community) inheriting from xarray.Index
, which would handle the CRS/projection/coordinate (meta)data and provide its own implementation of selection, indexing, alignment (maybe also interpolation, groupby, etc?) for Xarray objects.
I think fully CRS-aware indexes are out of scope for Xoak, which is more focused on "basic" indexes for irregular data (including some indexes for lat/lon coordinates), but maybe such CRS-aware index could be itself built on top of one of the indexes that Xoak provides?
How does xoak's indexers perform with thousands of points (ex. 10000 x 6400 - large swath from the VIIRS satellite instrument)?
That's a good question, we haven't tested Xoak with many kinds of datasets yet. For example, using the s2_point
index for lat/lon unchunked data, it's possible to index a few dozens of millions of points (and query a few hundreds of thousand of points) within seconds.
Xoak's index registering system will eventually be dropped in favor of Xarray's index registering system, e.g., based on entrypoints like the newly refactored Xarray I/O backend system.
Does that mean that this entrypoint system doesn't exist yet for index registration?
Does that mean that this entrypoint system doesn't exist yet for index registration?
In Xarray indeed there's no API or system yet for registering or even using custom indexes other than pandas.Index
(https://github.com/pydata/xarray/pull/5102 has just been merged but it's only internal refactoring for now).