xpublish icon indicating copy to clipboard operation
xpublish copied to clipboard

Extending xpublish with new routers

Open jhamman opened this issue 4 years ago • 17 comments

In #29, @benbovy made it much easier to plugin new routers to the Xpublish API. Currently, there are three default routers:

  1. Base - /keys, /dict, etc...
  2. Common - /versions and datasets/endpoints
  3. Zarr - Zarr HTTP Store compatible endpoint

My intent with this issue is to explore what other routers / protocols may be applicable in this framework. Two potential candidates that have been floated before are:

  1. OPeNDAP / ERDDAP
  2. WMS

Curious what others think.

cc @lsetiawan, @benbovy, @ocefpaf, @rsignell-usgs

jhamman avatar Aug 13 '20 05:08 jhamman

.to_dataframe().to_csv()?

willirath avatar Aug 13 '20 05:08 willirath

I think it is a good idea to have built-in support in xpublish of established protocols like WMS, OPeNDAP, etc. alongside new protocols like Zarr. Xpublish is now very flexible so it should be pretty easy to add those protocols.

It looks like Xarray-leaflet provides something close to a WMS, but tightly coupled to ipyleaflet.

benbovy avatar Aug 13 '20 07:08 benbovy

It looks like Xarray-leaflet provides something close to a WMS, but tightly coupled to ipyleaflet.

Mmm maybe closer to a "XYZ" service. I'm not very familiar with all variants of web mapping services.

benbovy avatar Aug 13 '20 07:08 benbovy

This is exciting!

OPeNDAP for sure would be awesome.

WMS is a map service that returns JPG, PNG images, not actual data, so is the thought to use Holoviz/Datashader to deliver the WMS images?

That seems like a natural, since the "rasterize" function already delivers geoimagery at a specified pixel size! https://github.com/holoviz/datashader/issues/831

rsignell-usgs avatar Aug 13 '20 14:08 rsignell-usgs

I would start with the OPeNDAP b/c there is a pure python implementation for the server and it would be easier. I have no idea what would take to do the same for erddap, the docs and specs are not easy to navigate.

ocefpaf avatar Aug 13 '20 14:08 ocefpaf

Other ideas (formats): cf-json, netcdf-ld, covjson.

There's a good discussion + links here: https://github.com/pangeo-data/pangeo-datastore/issues/3

I started playing with covjson and xpublish here: https://github.com/ESM-VFC/esm-vfc-api-demo/pull/11

benbovy avatar Sep 17 '20 07:09 benbovy

I just had a look at Titiler, which already has a great set of features for serving geospatial raster data: multiple tile formats (raw data or image), multiple projections, wmts, etc.

It would be great if we could avoid reinventing the wheel here, i.e., depend on Titiler to create web map tiles dynamically from xarray Datasets and serve it via Xpublish!

@vincentsarago @kylebarron -- are Titiler's router factory classes part of the public API? Do you think it would be feasible to subclass and/or adapt it so that we can replace the URL (BaseFactory.path_dependency) by our dependencies to access the xarray Dataset being served? And/or replace COGReader with a custom reader/backend?

benbovy avatar Dec 07 '20 16:12 benbovy

@benbovy thanks for the interest, To be honest, TiTiler is still in Alpha because it depends on 2 alpha/beta/rc modules: rio-tiler/cogeo-mosaic. I hope to publish the final version of those package before the end of the year but for now I'll just want the user to understand this.

are Titiler's router factory classes part of the public API?

Yes

Do you think it would be feasible to subclass and/or adapt it so that we can replace the URL (BaseFactory.path_dependency) by our dependencies to access the xarray Dataset being served?

I'm not sure to understand, but yes we build TiTiler is modularity in mind.

And/or replace COGReader with a custom reader/backend?

This is the first goal of the TilerFactory https://github.com/developmentseed/titiler/blob/master/titiler/endpoints/stac.py#L73-L78

https://developmentseed.org/titiler/concepts/customization/

vincentsarago avatar Dec 07 '20 16:12 vincentsarago

We've had discussions somewhere about Zarr vs Numpy for sending uncompressed data back to the client. https://github.com/zarr-developers/community/issues/37 I've thought about making a PR to explicitly add Zarr support to titiler, but haven't had time to pursue that idea

kylebarron avatar Dec 07 '20 16:12 kylebarron

Thanks for the quick answers!

TiTiler is still in Alpha

That perfectly fine! Xpublish is at an early stage of development too, and it is also built with modularity in mind so that we can easily experiment with new functionalities (pluggable routers).

We've had discussions somewhere about Zarr vs Numpy for sending uncompressed data back to the client. I've thought about making a PR to explicitly add Zarr support to titiler, but haven't had time to pursue that idea

That would be great, although here I'm thinking more about leveraging Titiler in order to extend Xpublish with "web mapping friendly" API endpoints (i.e., OGC-compliant endpoints, morecantile generated tiles, images/colormaps, etc.). Those endpoints would work with any data format that can be loaded with Xarray, and might also co-exist with other, non-geospatial API endpoints (e.g., serving raw multi-dimensional data using various protocols like OPeNDAP, Zarr, etc.).

I need to look at Titiler more in depth. My understanding is that a Titiler router factory rely on a dataset path (PathParams) + reader for all its endpoints, whereas a xpublish router relies on a get_dataset dependency (that directly returns a xarray.Dataset object) for all its endpoints. So I guess I need to figure out how to adapt one to the other.

benbovy avatar Dec 07 '20 17:12 benbovy

@kylebarron, dynamic tile services such as titiler make good use of the overviews from COGS which we don't have in Xarray right?

I remember hearing something about overviews being discussed for Zarr, but couldn't find that discussion...

rsignell-usgs avatar Dec 07 '20 17:12 rsignell-usgs

Yes, it's much faster to render lower zoom levels when you have overviews, so that you can read less data instead of downsampling from full-resolution data. I've also seen discussions about multi-resolution Zarr datasets, but I also don't know where that was

kylebarron avatar Dec 07 '20 18:12 kylebarron

Found a good entry point into the zarr tile-server discussion with this comment from @rabernat: https://github.com/zarr-developers/community/issues/37#issuecomment-724767982

rsignell-usgs avatar Dec 07 '20 18:12 rsignell-usgs

@TomAugspurger I've just had a look at xstac after reading your post on Pangeo's Discourse. I think it would be great to have a STAC API router built in xpublish! It could be an optional router... You already implemented pydantic models in xstac so it should be pretty straightforward to eventually integrate it in xpublish.

benbovy avatar Jun 24 '21 20:06 benbovy

Thanks @benbovy. I've been trying to work out what exactly the relationship between STAC, Zarr, and xpublish would be. My (uninformed) hypothesis was that xpublish could be used to serve STAC Items (a chunk of an Xarray dataset, both the data from the Zarr chunk + the coordinates). If you have any thoughts on what use-cases this would serve and how it could be done I'd love to hear them.

TomAugspurger avatar Jul 14 '21 14:07 TomAugspurger

@TomAugspurger I don't have specific use-cases in mind yet, lately I've just been following the development of STAC specs and related tools with great interest.

I think that xpublish could be used to serve an Xarray Dataset using either STAC Item-level or Collection-level asset(s), as you explain on Pangeo's Discourse. Or both?

I mainly see xpublish as a "Swiss Army Knife", flexible backend solution where any data source supported by Xarray (NetCDF, Zarr, GRIB, etc.) could be dynamically served via one or more standardized APIs (STAC, WMS, Zarr, etc.) "just" by doing

import xarray as xr
import xpublish

ds = xr.load_dataset(...)
ds.rest.serve()

The served data chunks (+ metadata) may also be dynamically generated independently of the original data chunks (if any) so it could fit a broader range of front-end applications, even though this wouldn't be optimal in all cases.

This would be a useful tool, complementary to statically generated data catalogs.

benbovy avatar Jul 22 '21 16:07 benbovy

@rsignell-usgs I think this is a good idea.

abuddenb avatar Dec 09 '22 15:12 abuddenb