xarray
xarray copied to clipboard
Testing DataArray equality using built-in '==' operator leads to mutilated DataArray.attrs dictionary
What happened?
In previous versions of xarray, testing numerical equivalence of two DataArrays was possible using the built-in operator '==' and without side affects. Now in version 2022.6.0, when one DataArray lacks an attribute that the other DataArray has, the DataArray with the attribute is mutilated during comparison leading to an empty attrs dictionary.
What did you expect to happen?
DataArray_1 == DataArray_2 should not have side affects.
Minimal Complete Verifiable Example
import xarray as xr
da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
da_withunits.frequency.attrs["units"] = "GHz"
print(da_withunits.frequency.units)
da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
print(da_withunits == da_withoutunits)
print(da_withunits.frequency.units)
MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
GHz
<xarray.DataArray (frequency: 3)>
array([ True, True, True])
Coordinates:
* frequency (frequency) int32 1 2 3
Traceback (most recent call last):
File "d:\projects\ssdv\mvce.py", line 9, in <module>
print(da_withunits.frequency.units)
File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py", line 256, in __getattr__
raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'units'
Anything else we need to know?
No response
Environment
xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.1 scipy: 1.9.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.2.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None
~can you try if setting keep_attrs=True
helps?~
That's wrong, I can reproduce the side-effects. Not sure where that's coming from, though. And interestingly, only the first operand is mutated, da_withoutunits == da_withunits
does not drop the units on da_withunits
.
keep_attrs=True doesn't help
In [1]: import xarray as xr
In [2]: xr.set_options(keep_attrs=True)
Out[2]: <xarray.core.options.set_options at 0x1789a959fa0>
In [3]: da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [4]: da_withunits.frequency.attrs["units"] = "GHz"
In [5]: da_withunits.frequency.units
Out[5]: 'GHz'
In [6]: da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [7]: da_withunits == da_withoutunits
Out[7]:
<xarray.DataArray (frequency: 3)>
array([ True, True, True])
Coordinates:
* frequency (frequency) int32 1 2 3
In [8]: da_withunits.frequency.units
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 da_withunits.frequency.units
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py:256, in AttrAccessMixin.__getattr__(self, name)
254 with suppress(KeyError):
255 return source[name]
--> 256 raise AttributeError(
257 f"{type(self).__name__!r} object has no attribute {name!r}"
258 )
AttributeError: 'DataArray' object has no attribute 'units'
bisecting tells me this is a regression introduced by #6389. Looking at the code, this happens because copying the variables with variables.copy()
makes a shallow copy of the dictionary (and not its values), which means that we're actually mutating the Dataset
variables. If I change that line to
# make a shallow copy of each variable
new_variables = {name: var.copy() for name, var in variables.items()}
we stop mutating the dataset.
cc @benbovy