xarray
xarray copied to clipboard
Dimensions re-defined in netCDF groups
What happened?
Dimensions from the root group are re-defined in subgroups when saving a DataTree to netCDF.
What did you expect to happen?
Subgroups to inherit dimensions from the root group, as explained in CF conventions
Minimal Complete Verifiable Example
import xarray as xr
import netCDF4
# Create sample tree
only_coords = xr.Dataset(
coords={"y": [1, 2], "x": [1, 2]}
)
only_data = xr.Dataset(
{"myvar": (("y", "x"), [[1, 2], [3, 4]])}
)
datasets = {
"/": only_coords,
"/data": only_data
}
tree = xr.DataTree.from_dict(datasets)
# Save to netCDF
tree.to_netcdf("test.nc")
# Display file contents
ds = netCDF4.Dataset("test.nc")
print(ds.groups["data"])
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
<class 'netCDF4.Group'>
group /data:
dimensions(sizes): y(2), x(2)
variables(dimensions): int64 myvar(y, x)
Anything else we need to know?
Using netCDF4 it is possible to create a group with only the variable and inherited dimensions
import netCDF4
import numpy as np
root_group = netCDF4.Dataset("test2.nc", mode="w")
root_group.createDimension("y", 2)
root_group.createDimension("x", 2)
root_group.createVariable("y", "i8", ("y",))
root_group.createVariable("x", "i8", ("x",))
data_group = root_group.createGroup("data")
data_var = data_group.createVariable("data", "i8", ("y", "x"))
data_var[:] = np.array([[1, 2], [3, 4]])
root_group.close()
ncdump:
netcdf test2 {
dimensions:
y = 2 ;
x = 2 ;
variables:
int64 y(y) ;
int64 x(x) ;
group: data {
variables:
int64 data(y, x) ;
} // group data
}
Environment
INSTALLED VERSIONS
commit: None python: 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-513.5.1.el8_9.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.utf-8 LANG: en_US LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2025.3.1 pandas: 2.2.3 numpy: 2.1.3 scipy: 1.15.2 netCDF4: 1.7.2 pydap: None h5netcdf: 1.5.0 h5py: 3.13.0 zarr: 2.18.4 cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: None dask: 2025.2.0 distributed: 2025.2.0 matplotlib: 3.10.0 cartopy: 0.24.0 seaborn: 0.13.2 numbagg: None fsspec: 2025.2.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 75.8.2 pip: 25.0.1 conda: None pytest: 8.3.4 mypy: 1.15.0 IPython: 8.17.2 sphinx: 8.2.1
Thanks for the clear report. I agree, we should fix this. It will likely require some adjustments to how Xarray writes netCDF files, to make it more "data tree" aware.
I think I have seen this as well, and with the consequence that Panoply reports an error and does not display the leaf node variables. I think it is because I have created the coordinate at the root level and purged the coordinate variables at the leaf node. It appears that the dimension is still being defined at the leaf level. Panoply is seeing the leaf dimension but without coordinate data.
My use-case is attempting to "hoist" duplicate coordinate variables to the root level. I have created the coordinate variables at the root, and dropped the coordinate variables at the leaf nodes. However, even with a follow-up copying the data top-down, and expecting the root-level coordinates to be inherited to the leaf nodes, I cannot find any way to achieve a leaf-level variable with dimension references to an inherited coordinate, that then displays correctly in Panoply.
For context - my team is attempting to migrate our application development entirely to xarray. Our latest application is an "annotator" application that attempts to provide missing coordinate variables and attributes for data that is not fully CF compliant. Being able to create shared, inherited coordinates, with leaf variables e.g., displaying correctly in Panoply, is an important feature for that application.
I have verified that the symptom is the result of dimension declarations at the leaf level - by “fixing” the issue in NetCDF. I am able to copy the data using the NetCDF library, recursively, but not creating dimensions if they already exist via inheritance. The NetCDF file that results can be opened and viewed in Panoply.
Weren’t they intended to be redefined by design? Coordinate inheritance for xarray.DataTree
Weren’t they intended to be redefined by design?
My reading/understanding (from the implementation of #9077 in https://github.com/pydata/xarray/pull/9063) is that what was decided upon was to provide coordinate inheritance (with strict alignment)—partly because it was more intuitive; provided more consistency; and was similar to, though not exactly following, CF conventions. The alternative data model, i.e., independent coordinates between nodes and their parent nodes, was decided against—partly because it required duplication of coordinates at different levels of the tree, even though it would provide more flexibility for quirky datasets (I beliiieeve this flexibility can currently still be handled by open_dict_of_datasets).
So, I think the duplication reported in this issue that is happening when writing the datatree to a netCDF file is not really following the current paradigm for datatree inheritance in xarray.
I perhaps should have clarified my "fix" is only for data that is "aligned" in xarray terminology (hopefully I got that right). To reiterate - the fix is to not create lower-level dimension definitions (in NetCDF) if a higher level, inherited dimension, of the same name, exists. The fix will not work for netcdf files which intentionally create "masking" dimension coordinates at lower levels, while dimensions of the same name exist higher in the tree. But then, I think such data is also not supported by xarray.