xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Testing DataArray equality using built-in '==' operator leads to mutilated DataArray.attrs dictionary

Open l-johnston opened this issue 1 year ago • 3 comments

What happened?

In previous versions of xarray, testing numerical equivalence of two DataArrays was possible using the built-in operator '==' and without side affects. Now in version 2022.6.0, when one DataArray lacks an attribute that the other DataArray has, the DataArray with the attribute is mutilated during comparison leading to an empty attrs dictionary.

What did you expect to happen?

DataArray_1 == DataArray_2 should not have side affects.

Minimal Complete Verifiable Example

import xarray as xr
da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
da_withunits.frequency.attrs["units"] = "GHz"
print(da_withunits.frequency.units)
da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
print(da_withunits == da_withoutunits)
print(da_withunits.frequency.units)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

GHz
<xarray.DataArray (frequency: 3)>
array([ True,  True,  True])
Coordinates:
  * frequency  (frequency) int32 1 2 3
Traceback (most recent call last):
  File "d:\projects\ssdv\mvce.py", line 9, in <module>
    print(da_withunits.frequency.units)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py", line 256, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'units'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252') libhdf5: None libnetcdf: None

xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.1 scipy: 1.9.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.2.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None

l-johnston avatar Jul 31 '22 17:07 l-johnston

~can you try if setting keep_attrs=True helps?~

That's wrong, I can reproduce the side-effects. Not sure where that's coming from, though. And interestingly, only the first operand is mutated, da_withoutunits == da_withunits does not drop the units on da_withunits.

keewis avatar Jul 31 '22 17:07 keewis

keep_attrs=True doesn't help

In [1]: import xarray as xr
In [2]: xr.set_options(keep_attrs=True)
Out[2]: <xarray.core.options.set_options at 0x1789a959fa0>
In [3]: da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [4]: da_withunits.frequency.attrs["units"] = "GHz"
In [5]: da_withunits.frequency.units
Out[5]: 'GHz'
In [6]: da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [7]: da_withunits == da_withoutunits
Out[7]:
<xarray.DataArray (frequency: 3)>
array([ True,  True,  True])
Coordinates:
  * frequency  (frequency) int32 1 2 3

In [8]: da_withunits.frequency.units
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 da_withunits.frequency.units
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py:256, in AttrAccessMixin.__getattr__(self, name)
    254         with suppress(KeyError):
    255             return source[name]
--> 256 raise AttributeError(
    257     f"{type(self).__name__!r} object has no attribute {name!r}"
    258 )
AttributeError: 'DataArray' object has no attribute 'units'

l-johnston avatar Jul 31 '22 18:07 l-johnston

bisecting tells me this is a regression introduced by #6389. Looking at the code, this happens because copying the variables with variables.copy() makes a shallow copy of the dictionary (and not its values), which means that we're actually mutating the Dataset variables. If I change that line to

# make a shallow copy of each variable
new_variables = {name: var.copy() for name, var in variables.items()}

we stop mutating the dataset.

cc @benbovy

keewis avatar Aug 01 '22 09:08 keewis