xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Dataset.copy(deep=True) does not deepcopy .attrs

Open kefirbandi opened this issue 6 years ago • 3 comments

But it would be expected (at least by me) that it does.

kefirbandi avatar Mar 21 '19 13:03 kefirbandi

Thanks @kefirbandi, can you send in a PR?

dcherian avatar Mar 21 '19 14:03 dcherian

Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.

DerPlankton13 avatar Jul 25 '22 14:07 DerPlankton13

Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.

I am seeing the same issue as well.

Ch-Meier avatar Jul 25 '22 14:07 Ch-Meier

Cannot reproduce on current master using

import xarray as xr

ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1})
ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
print(ds2.attrs)  # returns: {'t': 1}

headtr1ck avatar Sep 04 '22 14:09 headtr1ck

even with

In [1]: import xarray as xr
   ...: 
   ...: ds = xr.Dataset({"a": ("x", [1, 2, 3], {"t": 0})}, attrs={"t": 1})
   ...: ds2 = ds.copy(deep=True)
   ...: ds.attrs["t"] = 5
   ...: ds.a.attrs["t"] = 6
   ...: 
   ...: display(ds2, ds2.a)
<xarray.Dataset>
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 1 2 3
Attributes:
    t:        1
<xarray.DataArray 'a' (x: 3)>
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
    t:        0

I cannot reproduce. @Ch-Meier, @DerPlankton13, can either of you post a minimal example demonstrating the issue?

keewis avatar Sep 04 '22 17:09 keewis

Just for the record, I just ran into this for the specific case of nested dictionary attrs in Dataarray.attrs.

It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too...

# MINIMAL EXAMPLE

import xarray as xr
import numpy as np

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data.attrs['flat']='0'
data.attrs['nested']={'level1':'1'}

data2 = data.copy(deep=True)
data2.attrs['flat']='2'  # OK
# data2.attrs['nested']={'level1':'2'}  # OK
# data2.attrs['nested']['level1'] = '2'  # Fails - overwrites data
data2.attrs['nested'].update({'level1':'2'})  # Fails - overwrites data

print(data.attrs)
print(data2.attrs)

Outputs

In XR 2022.3.0 and 2022.6.0 this gives (incorrect):

{'flat': '0', 'nested': {'level1': '2'}}
{'flat': '2', 'nested': {'level1': '2'}}

As a work-around, safe attrs copy with deepcopy works:

data2 = data.copy(deep=True)
data2.attrs = copy.deepcopy(data.attrs)

With correct results after modification:

{'flat': '0', 'nested': {'level1': '1'}}
{'flat': '2', 'nested': {'level1': '2'}}

EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.

phockett avatar Sep 23 '22 18:09 phockett

I see, so it was a problem of recursively copying the attrs. @phockett would you be open to add your example as a test in a PR such that we can avoid this in the future?

headtr1ck avatar Sep 23 '22 18:09 headtr1ck

Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.

phockett avatar Sep 26 '22 15:09 phockett

Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.

Great! Just ask if you need help :)

Basically: fork repo, make new branch on your forked repo, add test, create PR (when you push GitHub should show a popup asking if you want to create a PR anyway).

headtr1ck avatar Sep 26 '22 15:09 headtr1ck

OK, new test now pushed as #7086. (Hopefully added in the right place and style!)

A couple of additional notes:

  • Revision to my comment above: this actually fails in 2022.3 and 2022.6 for nested attribs.
  • I took a look at the source code in dataarray.py, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally.
  • I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to ds.attrs, however the issue does affect the individual arrays within the dataset still (as expected).
import xarray as xr

ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1, "nested":{"t2": 1}})
ds.a.attrs = {"t": 'a1', "nested":{"t2": 'a1'}}

ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
ds.attrs["nested"]["t2"] = 10

ds2.a.attrs["t"] = 'a2'
ds2.a.attrs["nested"]["t2"] = 'a2'


print(ds.attrs)
print(ds.a.attrs)
print(ds2.attrs)
print(ds2.a.attrs)

Results in:

{'t': 5, 'nested': {'t2': 10}}
{'t': 'a1', 'nested': {'t2': 'a2'}}
{'t': 1, 'nested': {'t2': 1}}
{'t': 'a2', 'nested': {'t2': 'a2'}}

phockett avatar Sep 26 '22 19:09 phockett

Ok, I thought that copying attrs was fixed. Seems like it did not... I will create a PR to fix this first, then we can use your test :)

headtr1ck avatar Sep 26 '22 20:09 headtr1ck

Ok, I thought that copying attrs was fixed. Seems like it did not...

Sorry for the mix-up there - think I initially tested in 2022.6 with the extra data2.attrs = copy.deepcopy(data.attrs) implemented. Caught it with the new test routine 😄

phockett avatar Sep 26 '22 20:09 phockett