xarray
xarray copied to clipboard
Dataset.copy(deep=True) does not deepcopy .attrs
But it would be expected (at least by me) that it does.
Thanks @kefirbandi, can you send in a PR?
Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.
Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.
I am seeing the same issue as well.
Cannot reproduce on current master using
import xarray as xr
ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1})
ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
print(ds2.attrs) # returns: {'t': 1}
even with
In [1]: import xarray as xr
...:
...: ds = xr.Dataset({"a": ("x", [1, 2, 3], {"t": 0})}, attrs={"t": 1})
...: ds2 = ds.copy(deep=True)
...: ds.attrs["t"] = 5
...: ds.a.attrs["t"] = 6
...:
...: display(ds2, ds2.a)
<xarray.Dataset>
Dimensions: (x: 3)
Dimensions without coordinates: x
Data variables:
a (x) int64 1 2 3
Attributes:
t: 1
<xarray.DataArray 'a' (x: 3)>
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
t: 0
I cannot reproduce. @Ch-Meier, @DerPlankton13, can either of you post a minimal example demonstrating the issue?
Just for the record, I just ran into this for the specific case of nested dictionary attrs in Dataarray.attrs.
It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too...
# MINIMAL EXAMPLE
import xarray as xr
import numpy as np
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data.attrs['flat']='0'
data.attrs['nested']={'level1':'1'}
data2 = data.copy(deep=True)
data2.attrs['flat']='2' # OK
# data2.attrs['nested']={'level1':'2'} # OK
# data2.attrs['nested']['level1'] = '2' # Fails - overwrites data
data2.attrs['nested'].update({'level1':'2'}) # Fails - overwrites data
print(data.attrs)
print(data2.attrs)
Outputs
In XR 2022.3.0 and 2022.6.0 this gives (incorrect):
{'flat': '0', 'nested': {'level1': '2'}}
{'flat': '2', 'nested': {'level1': '2'}}
As a work-around, safe attrs copy with deepcopy works:
data2 = data.copy(deep=True)
data2.attrs = copy.deepcopy(data.attrs)
With correct results after modification:
{'flat': '0', 'nested': {'level1': '1'}}
{'flat': '2', 'nested': {'level1': '2'}}
EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.
I see, so it was a problem of recursively copying the attrs. @phockett would you be open to add your example as a test in a PR such that we can avoid this in the future?
Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.
Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.
Great! Just ask if you need help :)
Basically: fork repo, make new branch on your forked repo, add test, create PR (when you push GitHub should show a popup asking if you want to create a PR anyway).
OK, new test now pushed as #7086. (Hopefully added in the right place and style!)
A couple of additional notes:
- Revision to my comment above: this actually fails in 2022.3 and 2022.6 for nested attribs.
- I took a look at the source code in
dataarray.py, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally. - I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to
ds.attrs, however the issue does affect the individual arrays within the dataset still (as expected).
import xarray as xr
ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1, "nested":{"t2": 1}})
ds.a.attrs = {"t": 'a1', "nested":{"t2": 'a1'}}
ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
ds.attrs["nested"]["t2"] = 10
ds2.a.attrs["t"] = 'a2'
ds2.a.attrs["nested"]["t2"] = 'a2'
print(ds.attrs)
print(ds.a.attrs)
print(ds2.attrs)
print(ds2.a.attrs)
Results in:
{'t': 5, 'nested': {'t2': 10}}
{'t': 'a1', 'nested': {'t2': 'a2'}}
{'t': 1, 'nested': {'t2': 1}}
{'t': 'a2', 'nested': {'t2': 'a2'}}
Ok, I thought that copying attrs was fixed. Seems like it did not... I will create a PR to fix this first, then we can use your test :)
Ok, I thought that copying attrs was fixed. Seems like it did not...
Sorry for the mix-up there - think I initially tested in 2022.6 with the extra data2.attrs = copy.deepcopy(data.attrs) implemented. Caught it with the new test routine 😄