xarray
xarray copied to clipboard
Expose `memory` argument for "netcdf4" engine
This commit exposes the memory argument for the "netcdf4" engine,
allowing one to create a netcdf dataset from a memory buffer, like so:
buffer: bytes = ...
ds = xr.open_dataset("", engine="netcdf4", memory=buffer)
- [x] Closes #6955
Thanks @ianliu you should be able to do this with xr.open_dataset(..., backend_kwargs={"memory": buffer})
@dcherian backend_kwargs doesn't work either, because the NetCDF4BackendEntrypoint.open_dataset function doesn't accept a memory argument:
https://github.com/pydata/xarray/blob/434f9e8929942afc2380eab52a07e77d30cc7885/xarray/backends/netCDF4_.py#L534-L552
Ah thanks. I always forget that.
Can you add a test please? Somewhere in this class would be the right place I think https://github.com/pydata/xarray/blob/434f9e8929942afc2380eab52a07e77d30cc7885/xarray/tests/test_backends.py#L1225
@dcherian I've added a unit test, but outside the class. Is this a problem? I see in the Xarray's contributing page that Xarray is transitioning to more functional tests.
Hmm, the test doesn't pass like this, it is complaining that nc4 is not defined. I will put the test inside the class.
@dcherian can you review my changes? I think the PR is done now.
I've just committed a change that also allows one to save a netcdf file to memory, like so:
ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) })
buf = ds.to_netcdf(engine="netcdf4")
assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds)
@ianliu - are you interested in finishing up this PR? From my perspective, this seems like a useful feature that would be nice to get in to xarray.
Hi @jhamman ! I'm willing to finish it, but don't know whats really missing. I guess I'll rebase with main and fix the failing tests.
But I would like a review on the second commit, since it adds more profound changes to the internal API's, such as returning a value from CachingFileManager.close function. This was needed because NetCDF4 returns the bytearray upon close.
Also, the logic on the to_netcdf function is starting to get a little bit convoluted, would be nice to get a review there as well.
@jhamman sorry for the delay. So, the return that you commented "reupping this question", I did respond the first code review but forgot to submit my answer... I will respond here as well, because I find all these code reviews confusing.
The return statement is there because the netCDF4 API returns the byte array upon closing the dataset. See the API docs here: https://unidata.github.io/netcdf4-python/#in-memory-diskless-datasets
@jhamman I think this PR is OK now, what do you think?
The error of mypy is real, but unfortunately this is a design error on our side, which is out of scope for this PR. see the mypy message:
xarray/backends/netCDF4_.py:563: error: Signature of "open_dataset" incompatible with supertype "BackendEntrypoint" [override]
so simply add a # type: ignore[override] at the end of this line.
It would be nice to have this merged. @ianliu are you able to continue this?
It would be nice to have this merged. @ianliu are you able to continue this?
Hey! I guess I can continue this weekend. I will be happy to get this done.
@headtr1ck I've just rebased my changes onto the main branch.