cf-python icon indicating copy to clipboard operation
cf-python copied to clipboard

Converting field(s) to & from CF-compliant `xarray.Dataset`

Open sadielbartholomew opened this issue 1 year ago • 1 comments

A common way to work with weather and climate datasets is of course the xarray library, so it would be nice to have a means to convert to and from an xarray.Dataset if it has sufficient and conformant-enough CF metadata and suitable structure, for which the xarray.decode_cf function may be able to assist on the xarray-side.

Ideally we can have a method for such a conversion and/or at least a recipe in the documentation showing how this can be done, for both a trivially-convertible case and for a case of a field and/or Dataset that needs some tweaks to get the right structure before the conversion can be made.

Note in #706 we considered and supported input of xarray arrays, so that helps towards this.

sadielbartholomew avatar May 22 '24 16:05 sadielbartholomew

Maybe it could be interesting to explore if you could add a to_cfpython and from_cfpython function to the ncdata package developed by @pp-mo. That would allow users to seamlessly switch between all of the most commonly used Python weather and climate data analysis software packages, even within a single analysis.

A really nice feature of the ncdata package is that it provides a clone of the NetCDF4 Dataset API and functions to convert Xarray Datasets to/from it. This allows users who have some data in Xarray that does not comply with the CF conventions (e.g. loaded from Zarr using a 3rd party object store or a NetCDF file in a central location on an HPC system), to convert their Xarray Dataset to an in-memory NetCDF Dataset, fix the NetCDF attributes to make the data CF compliant, and then interpret that with a package that works with the CF conventions. This is convenient for users, as it allows loading any data supported by Xarray in a computationally efficient way while still benefitting from the advantages that CF-aware packages like cf-python and Iris bring. It is also convenient from a software maintenance point of view, as it removes the need for custom logic to support all the possible ways in which datasets can be non-compliant with the CF conventions.

bouweandela avatar Oct 07 '24 08:10 bouweandela