xarray Lingering memory connections when extracting underlying `np.arrays` from datasets

What is your issue?

I know that generally, ds2 = ds connects the two objects in memory, and changes in one will also cause changes in the other.

However, I generally assume that certain operations should break this connection, for example:

extracting the underlying np.array from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.)
using the underlying np.array into a new dataset

In other words, I would expect that using ds['var'].values would be similar to copy.deepcopy(ds['var'].values).

Here's an example that illustrates how in these cases, the objects are still linked in memory:

(apologies for the somewhat hokey example)

import xarray as xr
import numpy as np

# Create a dataset
ds = xr.Dataset(coords = {'lon':(['lon'],np.array([178.2,179.2,-179.8, -178.8,-177.8,-176.8]))})
print('\nds: ')
print(ds)

# Create a new dataset that uses the values of the first dataset
ds2 = xr.Dataset({'lon1':(['lon'],ds.lon.values)},
                  coords = {'lon':(['lon'],ds.lon.values)})
print('\nds2: ')
print(ds2)

# Change ds2's 'lon1' variable 
ds2['lon1'][ds2['lon1']<0] = 360 + ds2['lon1'][ds2['lon1']<0]

# `ds2` is changed as expected
print('\nds2 (should be modified): ')
print(ds2)

# `ds` is changed, which is *not* expected
print('\nds (should not be modified): ')
print(ds)

The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have .values also call .copy(deep=True).values)

Appreciate y'all's thoughts on this!

Feb 09 '24 18:02 ks905383

In general, you're expected to deep-copy explicitly to break these "links". This is the numpy paradigm

Feb 09 '24 18:02 dcherian

If you want to read up on this, look for "view vs copy"!

Feb 09 '24 19:02 max-sixty

Yeah, I guess in this case from a legibility standpoint, the fact that .values 'changes' (from the user point of view) the form (and type) of the data from a DataArray to the underlying numpy array just feels different?

Like I wouldn't expect the following two operations:

a = np.ones(3)
b = a.astype(str)
a[0] = 5
print(b)

and

a = np.ones(3)
b = a
a[0] = 5
print(b)

to behave the same. But I do understand that from the backend perspective, .values seems to be more of the latter than the former, since it is just accessing something that's already there...

(relatedly, would it be worth it to link to the relevant numpy docs in this part of the xarray docs?)

Feb 09 '24 19:02 ks905383

A related issue is that this allows you to (possibly inadvertently) circumvent certain xarray safeguards, like the TypeError around not being able to modify IndexVariables:

# Create sample dataset
ds = xr.Dataset({'test':(['lon'],[5,6,7])},coords = {'lon':(('lon'),[0,1,2])})

# Raises TypeError, to avoid changing indices like this
ds['lon'][0] = 2

# Now, extract underly numpy array
a = ds.lon.values

# Change value
a[0] = 2

# This changes `ds` without raising error
print(ds)

Feb 09 '24 19:02 ks905383

(relatedly, would it be worth it to link to the relevant numpy docs in this part of the xarray docs?)

Yes! That would be a welcome contribution.

A related issue is that this allows you to (possibly inadvertently) circumvent certain xarray safeguards, like the TypeError around not being able to modify IndexVariables:

Yes. But I'm not sure there's much we can do about this. Our focus should be "if you use xarray operations, you won't get surprises"...

Feb 09 '24 19:02 max-sixty

Yes! That would be a welcome contribution.

Sounds good, I'll prep a PR

Feb 09 '24 20:02 ks905383

Resolved by #8744.

Jun 05 '24 11:06 kmuehlbauer

xarray xarray copied to clipboard

Lingering memory connections when extracting underlying `np.arrays` from datasets

What is your issue?

xarray
xarray copied to clipboard