kedro-plugins
kedro-plugins copied to clipboard
feat(datasets): `NetCDFDataset` support for `engine="h5netcdf"` [#620]
Description
https://github.com/kedro-org/kedro-plugins/issues/620
NetCDFDataset currently only supports engine="scipy", which uses an old version of NetCDF (NETCDF3). This is because the original implementation writes the data to a byte-buffer, for which xarray only supports engine="scipy".
This PR enables engine="h5netcdf" and engine="netcdf4", which support the more recent NETCDF4 HDF5 format.
Development notes
- Based on suggestion by @astrojuanlu, I save to a temporary file and copy to
fsspecwhen using a remote file-system. - Added pytest parametrize to use
engine="h5netcdf"andengine="netcdf4"
Checklist
- [x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [x] Updated the documentation to reflect the code changes
- [x] Added a description of this change in the relevant
RELEASE.mdfile - [x] Added tests to cover my changes
The unit tests seem to be failing potentially for a separate reason, although I'm not quite sure. Does anyone have insight into this?
Hi @charlesbmi, thanks for opening this PR! I see it's still in draft, do you need any help to get it ready for review?
@merelcht thanks for checking in! Yes, help would be great.
I am not sure how to fix the doctest error:
_________ [doctest] kedro_datasets.netcdf.netcdf_dataset.NetCDFDataset _________
051 >>> from kedro_datasets.netcdf import NetCDFDataset
052 >>> import xarray as xr
053 >>> ds = xr.DataArray(
054 ... [0, 1, 2], dims=["x"], coords={"x": [0, 1, 2]}, name="data"
055 ... ).to_dataset()
056 >>> dataset = NetCDFDataset(
057 ... filepath="path/to/folder",
058 ... save_args={"mode": "w"},
059 ... )
060 >>> dataset.save(ds)
UNEXPECTED EXCEPTION: DatasetError("Failed while saving data to data set NetCDFDataset(filepath=path/to/folder, load_args={}, protocol=file, save_args={'mode': w}).\n[Errno 13] Permission denied: '/home/runner/work/kedro-plugins/kedro-plugins/kedro-datasets/path/to/folder'")
It seems slightly separate from the code I've updated, and I'm not sure whether the runner is supposed to have permissions to add/modify random folders (i.e., presumably it is because path/to/ doesn't exist, and it is trying to create a path/to/folder?)
Would you be able to help me figure out how to resolve the test issue? Thanks!
@charlesbmi I'll have a look 🙂
@charlesbmi the doctest issue is now solved!
Thank you @merelcht ! I'll fix up my lint/test errors, and then I'll update the documentation / RELEASE.md this week
Fixed the rest of the tests and updated RELEASE.md, thanks for your help @merelcht !
Thanks! Updated, and tests are passing now 🎉 I appreciate your help
Thanks! Updated, and tests are passing now 🎉 I appreciate your help
You're welcome and thank you for your contribution! 😄 I've asked some other team members to review. We'll merge when we have 2 approvals.