pydap
pydap copied to clipboard
xarray handler
xarray is basically an in-memory representation of netCDF-like data structures. It would be amazing to have a pydap handler for xarray. That way we could launch an opendap server from within python (possibly related to #33) and feed it an xarray dataset. Such services could be chained together to form streaming data pipelines without ever touching a hard disk.
@rabernat, I'm not sure it is related to #33. In #33, the goal was mostly for testing purposes and my own experience tells me that starting a simple server from within python has much more pitfalls than advantages.
I however think that implementing a xarray handler could make a lot of sense because xarray can be used as a compatibility layer to many file types. By implementing an xarray handler we could essentially get a netcdf handler, handlers for all PyNIO formats and handlers for all pandas-compatible formats.
@jameshiebert, don't you think this approach could solve the merging and dependencies handling for handlers, as raised in #69?
@fujiisoup: this is the issue that I think would help solve the challenge you described on the xarray mailing list
We are considering asking some interns to work on the xarray pydap handler.
In the opinion of the pydap developers, would such an integration belong inside pydap or as a standalone project?
@fujiisoup's recent work in xarray's test suite converting an xarray.Dataset into a pydap Dataset might be a useful starting point for this: https://github.com/pydata/xarray/blob/b6300ea9d9e84e24fc2e03bdff06d8d0659e2344/xarray/tests/test_backends.py#L2003-L2017
@rabernat I think it could and it probably should reside inside pydap but, at the moment, the pydap repo is pretty much dormant. PRs have not been merged in months and I have very little time to devote to the project. Now could be a good time to rethink the governance structure of pydap. @jameshiebert do you have an opinion?
Now could be a good time to rethink the governance structure of pydap. @jameshiebert do you have an opinion?
I haven't had a ton of time recently to devote to Pydap either, but that's not to say that I'll never have any time. I'm happy to rethink governance and/or take on more maintainers who do have time...
pydap is a very valuable package for the python geoscience community. Are any of the maintainers at agencies who have a mandate to support such tools (e.g. Unidata, NCAR, NOAA, USGS, etc)?
@jameshiebert maybe keeping the same governance but adding maintainers that could jointly merge PRs without your direct approval might prove positive for this project. It seems as if there was some clear community interest recently from which motivated individuals could be tapped.
We would like to start working on this again now that pydap is revived.
@TanBowen is an intern here at Columbia and might be able to contribute here.
But it would be great to have some advice on the best path forward.
The main challenge I see is that all current handlers are based around files. Xarray isn't a file format but rather a library for reading files. Eventually, we would like to be able to serve zarr files via pydap, but that would first require that pydap can "handle" xarray objects.
I guess we are just looking for some general advice from the pydap devs on what is the right way to approach this problem.
A good place to start might be looking at the helper function we created in xarray's tests for converting an xarray.Dataset into pydap.model.DatasetType:
https://github.com/pydata/xarray/blob/0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e/xarray/tests/test_backends.py#L2323-L2337
This was created for client side tests in xarray (and is certainly not comprehensive), but I imagine that the server side of pydap also uses a DatasetType?
So following up @shoyer's comment, you would probably want to copy the general structure of the netCDF handler (https://github.com/pydap/pydap/blob/master/src/pydap/handlers/netcdf/init.py), but replace all calls to the netCDF library with xarray, following the example from the xarray backend test suite. You might not need all the LazyVariable stuff from the netCDF handler, since xarray's variables are already lazy.
@shoyer wow, that's very nice work!
Ping!
Anyone made an progress on this? With all sorts of stuff being moved to the "Cloud" Zarr would be really nice. And xarray is a good way to get there.
To my knowledge no one has done further work on this issue. It's really too bad because it is a low-hanging fruit that would open up all kinds of cool possibilities.
The other day @jhamman told me that @markcapece of NOAA has developed a new package call zarrdap which meets a similar need - described obscurely here: https://markcapece.net/projects/creations/zarrdap/
@ChrisBarker-NOAA @rabernat Took a little while: https://github.com/NCEI-NOAAGov/zarrdap
Very cool -- awesome!