xarray
xarray copied to clipboard
Keeping unused dimensions when opening a Dataset?
What is your issue?
I am attempting to open a dataset which has unused dimensions. Is it possible for this information to be retained?
import xarray as xr
import netCDF4
import numpy as np
# create dataset with dims x, y and a variable f
f = netCDF4.Dataset("test.nc", "w")
f.createDimension("x", 2)
f.createDimension("y", 3)
f.createVariable("f", np.float32)
print(f)
f.close()
print('\n')
# open dataset with xarray
ds = xr.open_dataset('test.nc')
print(ds)
This prints the following:
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(2), y(3)
variables(dimensions): float32 f()
groups:
<xarray.Dataset> Size: 4B
Dimensions: ()
Data variables:
f float32 4B ...
The output of ncdump test.nc is shown here:
netcdf test {
dimensions:
x = 2 ;
y = 3 ;
variables:
float f ;
data:
f = _ ;
}
This appears to be the same issue as was discussed here, but I could not find if the OP ever opened an issue.
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
@whophil Thanks for raising this. Could you please add the output in your example and also $ ncdump test.nc. Thanks.
Thanks @kmuehlbauer , I've added the output and ncdump output in the original post.
@whophil I've never experienced that, maybe because my dimensions are always backed by coordinates.
I'll have a look later, but good chance someone else can answer immediately.
Would you mind adding the output of xr.show_versions()? Did you check with engine="h5netcdf", too?
What would you want the resultant Dataset to look like though? Given that in Xarray's model the Dataset dims are just the set of all dims present on the various Variables.
Thanks @kmuehlbauer @TomNicholas!
Let me start by saying I am a new user of xarray and not entirely familiar with its data model.
I am using xarray to modify and rewrite NetCDF files which are provided to my by an upstream process. Xarray seemed like a nice way to do this, in particular because of its support for in-memory data, which is decidedly not-as-nice in the netCDF4 library.
What would you want the resultant Dataset to look like though? Given that in Xarray's model the Dataset dims are just the set of all dims present on the various Variables.
@TomNicholas ideally I would want the ncdump of the .nc file to appear the same as the one produced by netCDF4. Perhaps this is not compatible with xarray's data model? If so, that is totally fine. Thanks so much!
@TomNicholas Wouldn't this be a case where the resulting Dataset would have some extra information in the encoding attribute? No use in xarray, but the netCDF will roundtrip.
I would have thought this would not be possible in xarray, but @kmuehlbauer will know better than me whether the information can be preserved through some encoding option.
OK, @DocOtak's hunch is right, we would be able to preserve that information in encoding. Currently only unlimited dimensions are kept in ds.encoding.
https://github.com/pydata/xarray/blob/b9780e7a32b701736ebcf33d9cb0b380e92c91d5/xarray/backends/netCDF4_.py#L517-L522
So the following will work, but you will not be able to keep any dimension sizes:
import xarray as xr
import netCDF4
import numpy as np
# create dataset with dims x, y and a variable f
with netCDF4.Dataset("test.nc", "w") as ds:
f.createDimension("x", None)
f.createDimension("y", None)
f.createVariable("f", np.float32)
print('\n')
# open dataset with xarray
with xr.open_dataset('test.nc') as ds:
print(ds.encoding)
{'unlimited_dims': {'y', 'x'}, 'source': '/home/kai/projects/data/gists/xarray/datatree/test.nc'}
If you want to keep preserve the limited dims, too, you would need to add another loop in the above code. We might even think about making this a feature and try to preserve limited dimensions with which are not used by any variable in encoding. Not sure how much is involved to make this work.
Thanks for the discussion @kmuehlbauer @TomNicholas @DocOtak!
As a workaround, I store the unused dimension alongside my xarray.DataSet and then add this information back to the netcdf4.DataSet after writing it using the native I/O methods from the netcdf4 library. This works fine for my use case.
@whophil Thanks for coming back. Would you mind adding a simple example of your workaround as MCVE? That would help others who search for a similar solution.
I'm not sure, if this feature will be included in near time, though.