pangeo-forge-recipes icon indicating copy to clipboard operation
pangeo-forge-recipes copied to clipboard

Implement adapter for `OpenWithKerchunk | OpenWithXarray`

Open cisaacstern opened this issue 10 months ago • 6 comments

In theory, OpenWithKerchunk should be able to provide inputs for OpenWithXarray, which can help address issues such as https://github.com/pangeo-forge/pangeo-forge-recipes/issues/361. IIUC a version of this existed in 0.9.4 (or at least was in development there). Discussion in https://github.com/leap-stc/cmip6-leap-feedstock/issues/16#issuecomment-1694414477 reminded me that this would be a useful thing to implement (or re-implement, as the case may be).

cisaacstern avatar Aug 29 '23 19:08 cisaacstern

This seems like a great idea!

As far as design. OpenWithKerchunk returns a PCollection of references in memory. It seems like there would need to be either:

  1. An additional PTransform to convert the PCollection of references to a PCollection of fsspec mappers that could be read by OpenWithXarray?
  2. An option within OpenWithKerchunk that returns fsspec mapppers.

Any thoughts here @cisaacstern?

norlandrhagen avatar Oct 03 '23 15:10 norlandrhagen

Good questions, @norlandrhagen.

An option within OpenWithKerchunk that returns fsspec mapppers.

I think I'd lean towards this option. The downside this that it introduces multiple return types into OpenWithKerchunk, but the benefit is it keeps the user-facing API simpler.

cisaacstern avatar Oct 03 '23 15:10 cisaacstern

another option would be to have OpenWithXarray use an engine (xr.open_dataset backend) that immediately knows what to do with the references (the "kerchunk" engine discussed in fsspec/kerchunk#360?)

keewis avatar Oct 03 '23 15:10 keewis

That would be the best way!

@keewis are you working on that PR/issue or do you know if there is any development on it?

norlandrhagen avatar Oct 03 '23 15:10 norlandrhagen

I'm not working on this nor am I planning to at the moment (and I'm not aware of anyone else doing so), but the development will most likely happen on the kerchunk repo.

keewis avatar Oct 03 '23 15:10 keewis

Thanks for mentioning this @keewis.

Whichever solution we choose here, let's link https://github.com/fsspec/kerchunk/issues/360 in a comment, and mention that the implementation here is a shim until that issue is resolved.

cisaacstern avatar Oct 03 '23 16:10 cisaacstern