pydap icon indicating copy to clipboard operation
pydap copied to clipboard

xarray handler

Open rabernat opened this issue 8 years ago • 16 comments
trafficstars

xarray is basically an in-memory representation of netCDF-like data structures. It would be amazing to have a pydap handler for xarray. That way we could launch an opendap server from within python (possibly related to #33) and feed it an xarray dataset. Such services could be chained together to form streaming data pipelines without ever touching a hard disk.

rabernat avatar Mar 03 '17 17:03 rabernat

@rabernat, I'm not sure it is related to #33. In #33, the goal was mostly for testing purposes and my own experience tells me that starting a simple server from within python has much more pitfalls than advantages. I however think that implementing a xarray handler could make a lot of sense because xarray can be used as a compatibility layer to many file types. By implementing an xarray handler we could essentially get a netcdf handler, handlers for all PyNIO formats and handlers for all pandas-compatible formats.

@jameshiebert, don't you think this approach could solve the merging and dependencies handling for handlers, as raised in #69?

laliberte avatar Mar 05 '17 17:03 laliberte

@fujiisoup: this is the issue that I think would help solve the challenge you described on the xarray mailing list

rabernat avatar Nov 20 '17 16:11 rabernat

We are considering asking some interns to work on the xarray pydap handler.

In the opinion of the pydap developers, would such an integration belong inside pydap or as a standalone project?

rabernat avatar Jan 10 '18 15:01 rabernat

@fujiisoup's recent work in xarray's test suite converting an xarray.Dataset into a pydap Dataset might be a useful starting point for this: https://github.com/pydata/xarray/blob/b6300ea9d9e84e24fc2e03bdff06d8d0659e2344/xarray/tests/test_backends.py#L2003-L2017

shoyer avatar Jan 10 '18 16:01 shoyer

@rabernat I think it could and it probably should reside inside pydap but, at the moment, the pydap repo is pretty much dormant. PRs have not been merged in months and I have very little time to devote to the project. Now could be a good time to rethink the governance structure of pydap. @jameshiebert do you have an opinion?

laliberte avatar Jan 10 '18 19:01 laliberte

Now could be a good time to rethink the governance structure of pydap. @jameshiebert do you have an opinion?

I haven't had a ton of time recently to devote to Pydap either, but that's not to say that I'll never have any time. I'm happy to rethink governance and/or take on more maintainers who do have time...

jameshiebert avatar Jan 10 '18 19:01 jameshiebert

pydap is a very valuable package for the python geoscience community. Are any of the maintainers at agencies who have a mandate to support such tools (e.g. Unidata, NCAR, NOAA, USGS, etc)?

rabernat avatar Jan 10 '18 19:01 rabernat

@jameshiebert maybe keeping the same governance but adding maintainers that could jointly merge PRs without your direct approval might prove positive for this project. It seems as if there was some clear community interest recently from which motivated individuals could be tapped.

laliberte avatar Jan 10 '18 19:01 laliberte

We would like to start working on this again now that pydap is revived.

@TanBowen is an intern here at Columbia and might be able to contribute here.

But it would be great to have some advice on the best path forward.

The main challenge I see is that all current handlers are based around files. Xarray isn't a file format but rather a library for reading files. Eventually, we would like to be able to serve zarr files via pydap, but that would first require that pydap can "handle" xarray objects.

I guess we are just looking for some general advice from the pydap devs on what is the right way to approach this problem.

rabernat avatar Nov 29 '18 20:11 rabernat

A good place to start might be looking at the helper function we created in xarray's tests for converting an xarray.Dataset into pydap.model.DatasetType: https://github.com/pydata/xarray/blob/0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e/xarray/tests/test_backends.py#L2323-L2337

This was created for client side tests in xarray (and is certainly not comprehensive), but I imagine that the server side of pydap also uses a DatasetType?

shoyer avatar Nov 29 '18 20:11 shoyer

So following up @shoyer's comment, you would probably want to copy the general structure of the netCDF handler (https://github.com/pydap/pydap/blob/master/src/pydap/handlers/netcdf/init.py), but replace all calls to the netCDF library with xarray, following the example from the xarray backend test suite. You might not need all the LazyVariable stuff from the netCDF handler, since xarray's variables are already lazy.

rabernat avatar Nov 30 '18 15:11 rabernat

@shoyer wow, that's very nice work!

XiaoLinhong avatar Dec 07 '18 08:12 XiaoLinhong

Ping!

Anyone made an progress on this? With all sorts of stuff being moved to the "Cloud" Zarr would be really nice. And xarray is a good way to get there.

ChrisBarker-NOAA avatar Nov 19 '21 20:11 ChrisBarker-NOAA

To my knowledge no one has done further work on this issue. It's really too bad because it is a low-hanging fruit that would open up all kinds of cool possibilities.

The other day @jhamman told me that @markcapece of NOAA has developed a new package call zarrdap which meets a similar need - described obscurely here: https://markcapece.net/projects/creations/zarrdap/

rabernat avatar Nov 19 '21 21:11 rabernat

@ChrisBarker-NOAA @rabernat Took a little while: https://github.com/NCEI-NOAAGov/zarrdap

abuddenb avatar Feb 24 '22 21:02 abuddenb

Very cool -- awesome!

ChrisBarker-NOAA avatar Feb 25 '22 00:02 ChrisBarker-NOAA