xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Expose `memory` argument for "netcdf4" engine

Open ianliu opened this issue 3 years ago • 7 comments

This commit exposes the memory argument for the "netcdf4" engine, allowing one to create a netcdf dataset from a memory buffer, like so:

buffer: bytes = ...
ds = xr.open_dataset("", engine="netcdf4", memory=buffer)
  • [x] Closes #6955

ianliu avatar Aug 26 '22 14:08 ianliu

Thanks @ianliu you should be able to do this with xr.open_dataset(..., backend_kwargs={"memory": buffer})

dcherian avatar Aug 26 '22 15:08 dcherian

@dcherian backend_kwargs doesn't work either, because the NetCDF4BackendEntrypoint.open_dataset function doesn't accept a memory argument:

https://github.com/pydata/xarray/blob/434f9e8929942afc2380eab52a07e77d30cc7885/xarray/backends/netCDF4_.py#L534-L552

ianliu avatar Aug 26 '22 15:08 ianliu

Ah thanks. I always forget that.

Can you add a test please? Somewhere in this class would be the right place I think https://github.com/pydata/xarray/blob/434f9e8929942afc2380eab52a07e77d30cc7885/xarray/tests/test_backends.py#L1225

dcherian avatar Aug 26 '22 16:08 dcherian

@dcherian I've added a unit test, but outside the class. Is this a problem? I see in the Xarray's contributing page that Xarray is transitioning to more functional tests.

ianliu avatar Aug 26 '22 17:08 ianliu

Hmm, the test doesn't pass like this, it is complaining that nc4 is not defined. I will put the test inside the class.

ianliu avatar Aug 26 '22 17:08 ianliu

@dcherian can you review my changes? I think the PR is done now.

ianliu avatar Aug 26 '22 18:08 ianliu

I've just committed a change that also allows one to save a netcdf file to memory, like so:

ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) })
buf = ds.to_netcdf(engine="netcdf4")
assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds)

ianliu avatar Aug 30 '22 19:08 ianliu

@ianliu - are you interested in finishing up this PR? From my perspective, this seems like a useful feature that would be nice to get in to xarray.

jhamman avatar Jan 31 '23 15:01 jhamman

Hi @jhamman ! I'm willing to finish it, but don't know whats really missing. I guess I'll rebase with main and fix the failing tests.

But I would like a review on the second commit, since it adds more profound changes to the internal API's, such as returning a value from CachingFileManager.close function. This was needed because NetCDF4 returns the bytearray upon close.

Also, the logic on the to_netcdf function is starting to get a little bit convoluted, would be nice to get a review there as well.

ianliu avatar Jan 31 '23 16:01 ianliu

@jhamman sorry for the delay. So, the return that you commented "reupping this question", I did respond the first code review but forgot to submit my answer... I will respond here as well, because I find all these code reviews confusing.

The return statement is there because the netCDF4 API returns the byte array upon closing the dataset. See the API docs here: https://unidata.github.io/netcdf4-python/#in-memory-diskless-datasets

ianliu avatar Feb 12 '23 00:02 ianliu

@jhamman I think this PR is OK now, what do you think?

ianliu avatar Feb 27 '23 16:02 ianliu

The error of mypy is real, but unfortunately this is a design error on our side, which is out of scope for this PR. see the mypy message:

xarray/backends/netCDF4_.py:563: error: Signature of "open_dataset" incompatible with supertype "BackendEntrypoint" [override]

so simply add a # type: ignore[override] at the end of this line.

headtr1ck avatar Mar 22 '23 19:03 headtr1ck

It would be nice to have this merged. @ianliu are you able to continue this?

headtr1ck avatar Jul 11 '24 17:07 headtr1ck

It would be nice to have this merged. @ianliu are you able to continue this?

Hey! I guess I can continue this weekend. I will be happy to get this done.

ianliu avatar Jul 12 '24 18:07 ianliu

@headtr1ck I've just rebased my changes onto the main branch.

ianliu avatar Jul 13 '24 14:07 ianliu