xpublish
xpublish copied to clipboard
Extending xpublish with new routers
In #29, @benbovy made it much easier to plugin new routers to the Xpublish API. Currently, there are three default routers:
- Base -
/keys
,/dict
, etc... - Common -
/versions
anddatasets
/endpoints - Zarr - Zarr HTTP Store compatible endpoint
My intent with this issue is to explore what other routers / protocols may be applicable in this framework. Two potential candidates that have been floated before are:
- OPeNDAP / ERDDAP
- WMS
Curious what others think.
cc @lsetiawan, @benbovy, @ocefpaf, @rsignell-usgs
.to_dataframe().to_csv()
?
I think it is a good idea to have built-in support in xpublish of established protocols like WMS, OPeNDAP, etc. alongside new protocols like Zarr. Xpublish is now very flexible so it should be pretty easy to add those protocols.
It looks like Xarray-leaflet provides something close to a WMS, but tightly coupled to ipyleaflet.
It looks like Xarray-leaflet provides something close to a WMS, but tightly coupled to ipyleaflet.
Mmm maybe closer to a "XYZ" service. I'm not very familiar with all variants of web mapping services.
This is exciting!
OPeNDAP for sure would be awesome.
WMS is a map service that returns JPG, PNG images, not actual data, so is the thought to use Holoviz/Datashader to deliver the WMS images?
That seems like a natural, since the "rasterize" function already delivers geoimagery at a specified pixel size! https://github.com/holoviz/datashader/issues/831
I would start with the OPeNDAP b/c there is a pure python implementation for the server and it would be easier. I have no idea what would take to do the same for erddap, the docs and specs are not easy to navigate.
Other ideas (formats): cf-json, netcdf-ld, covjson.
There's a good discussion + links here: https://github.com/pangeo-data/pangeo-datastore/issues/3
I started playing with covjson and xpublish here: https://github.com/ESM-VFC/esm-vfc-api-demo/pull/11
I just had a look at Titiler, which already has a great set of features for serving geospatial raster data: multiple tile formats (raw data or image), multiple projections, wmts, etc.
It would be great if we could avoid reinventing the wheel here, i.e., depend on Titiler
to create web map tiles dynamically from xarray Datasets and serve it via Xpublish!
@vincentsarago @kylebarron -- are Titiler
's router factory classes part of the public API? Do you think it would be feasible to subclass and/or adapt it so that we can replace the URL (BaseFactory.path_dependency
) by our dependencies to access the xarray Dataset being served? And/or replace COGReader
with a custom reader/backend?
@benbovy thanks for the interest, To be honest, TiTiler is still in Alpha because it depends on 2 alpha/beta/rc modules: rio-tiler/cogeo-mosaic. I hope to publish the final version of those package before the end of the year but for now I'll just want the user to understand this.
are Titiler's router factory classes part of the public API?
Yes
Do you think it would be feasible to subclass and/or adapt it so that we can replace the URL (BaseFactory.path_dependency) by our dependencies to access the xarray Dataset being served?
I'm not sure to understand, but yes we build TiTiler is modularity in mind.
And/or replace COGReader with a custom reader/backend?
This is the first goal of the TilerFactory https://github.com/developmentseed/titiler/blob/master/titiler/endpoints/stac.py#L73-L78
https://developmentseed.org/titiler/concepts/customization/
We've had discussions somewhere about Zarr vs Numpy for sending uncompressed data back to the client. https://github.com/zarr-developers/community/issues/37 I've thought about making a PR to explicitly add Zarr support to titiler, but haven't had time to pursue that idea
Thanks for the quick answers!
TiTiler is still in Alpha
That perfectly fine! Xpublish is at an early stage of development too, and it is also built with modularity in mind so that we can easily experiment with new functionalities (pluggable routers).
We've had discussions somewhere about Zarr vs Numpy for sending uncompressed data back to the client. I've thought about making a PR to explicitly add Zarr support to titiler, but haven't had time to pursue that idea
That would be great, although here I'm thinking more about leveraging Titiler in order to extend Xpublish with "web mapping friendly" API endpoints (i.e., OGC-compliant endpoints, morecantile generated tiles, images/colormaps, etc.). Those endpoints would work with any data format that can be loaded with Xarray, and might also co-exist with other, non-geospatial API endpoints (e.g., serving raw multi-dimensional data using various protocols like OPeNDAP, Zarr, etc.).
I need to look at Titiler more in depth. My understanding is that a Titiler router factory rely on a dataset path (PathParams
) + reader for all its endpoints, whereas a xpublish router relies on a get_dataset
dependency (that directly returns a xarray.Dataset
object) for all its endpoints. So I guess I need to figure out how to adapt one to the other.
@kylebarron, dynamic tile services such as titiler make good use of the overviews from COGS which we don't have in Xarray right?
I remember hearing something about overviews being discussed for Zarr, but couldn't find that discussion...
Yes, it's much faster to render lower zoom levels when you have overviews, so that you can read less data instead of downsampling from full-resolution data. I've also seen discussions about multi-resolution Zarr datasets, but I also don't know where that was
Found a good entry point into the zarr tile-server discussion with this comment from @rabernat: https://github.com/zarr-developers/community/issues/37#issuecomment-724767982
@TomAugspurger I've just had a look at xstac after reading your post on Pangeo's Discourse. I think it would be great to have a STAC API router built in xpublish
! It could be an optional router... You already implemented pydantic
models in xstac
so it should be pretty straightforward to eventually integrate it in xpublish
.
Thanks @benbovy. I've been trying to work out what exactly the relationship between STAC, Zarr, and xpublish would be. My (uninformed) hypothesis was that xpublish could be used to serve STAC Items (a chunk of an Xarray dataset, both the data from the Zarr chunk + the coordinates). If you have any thoughts on what use-cases this would serve and how it could be done I'd love to hear them.
@TomAugspurger I don't have specific use-cases in mind yet, lately I've just been following the development of STAC specs and related tools with great interest.
I think that xpublish could be used to serve an Xarray Dataset using either STAC Item-level or Collection-level asset(s), as you explain on Pangeo's Discourse. Or both?
I mainly see xpublish as a "Swiss Army Knife", flexible backend solution where any data source supported by Xarray (NetCDF, Zarr, GRIB, etc.) could be dynamically served via one or more standardized APIs (STAC, WMS, Zarr, etc.) "just" by doing
import xarray as xr
import xpublish
ds = xr.load_dataset(...)
ds.rest.serve()
The served data chunks (+ metadata) may also be dynamically generated independently of the original data chunks (if any) so it could fit a broader range of front-end applications, even though this wouldn't be optimal in all cases.
This would be a useful tool, complementary to statically generated data catalogs.
@rsignell-usgs I think this is a good idea.