intake-esm icon indicating copy to clipboard operation
intake-esm copied to clipboard

Extend the ESM collection specification to support catalogs containing datasets in GRIB format

Open andersy005 opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe.

Currently, the only valid data formats supported by intake-esm are netcdf and zarr. As a result, it's not possible to use intake-esm when working with datasets in grib format.

Describe the solution you'd like

We should consider extending the esm collection specification: https://github.com/intake/intake-esm/blob/main/docs/source/explanation/esm-collection-spec.md#assets-object to allow catalogs to contain datasets in grib format.

Additional context

I'm copying and pasting the comment in https://github.com/intake/intake-esm/issues/66#issuecomment-1061610199

I made one for dkrz era5 data which is on our HPC's disk storage. That means, the data access also only works on Mistral.

dkrz_cdp=intake.open_catalog("https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml")
esm_dkrz_era=dkrz_cdp.dkrz_era5_disk_grb_fromcloud

should work.

However, it is based on raw grb data and intake-esm does not really support grb. We would need a default engine here for grib. The one i use is cfgrib. You may say that users can provide that but since there is no grib_kwargs, I cannot for example merge grb assets with netcdf assets. Which is why I say there is no grb support in intake-esm.

Best, Fabi

Originally posted by @wachsylon in https://github.com/intake/intake-esm/issues/66#issuecomment-1061610199

andersy005 avatar Mar 08 '22 13:03 andersy005