datatree icon indicating copy to clipboard operation
datatree copied to clipboard

setting node name breaks tree linkage

Open marcel-goldschen-ohm opened this issue 1 year ago • 7 comments

# a simple tree
root = DataTree(name='root')
child = DataTree(name='child', parent=root)
grandchild = DataTree(name='grandchild', parent=child)

# changing the name of a child node does not correctly update the dict key in it's parent's children
child.name = 'childish'
print(root)  # this appears to be fine
print(list(root.children))  # however, the keys in root.children have not been updated
print(root['childish'])  # so this fails

Simple fix seems to be wherever the name property is being set it needs to also ensure that the keys in self.parent.children are updated as needed. Not sure if there is anywhere else that is storing these keys that also needs updating.

marcel-goldschen-ohm avatar Feb 07 '24 18:02 marcel-goldschen-ohm

Thank you for reporting this! The offending setter is here

https://github.com/xarray-contrib/datatree/blob/0afaa6cc1d6800987d8b9c37a604dc0a8c68aeaa/datatree/treenode.py#L597

This should update the key it is stored under in it's parent.

This should be a pretty simple fix if you (or perhaps @etienneschalk ?) are interested in going in? (If not then no worries)

TomNicholas avatar Feb 08 '24 17:02 TomNicholas

Hello @TomNicholas

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

Edit: the answer is in the README: https://github.com/xarray-contrib/datatree?tab=readme-ov-file#deprecation-notice

etienneschalk avatar Feb 08 '24 18:02 etienneschalk

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

I think we accept bug fixes here, but not new features. And whilst those bugfixes will be moved to xarray, you won't necessarily get full attribution for them (i.e. I'll probably do it the dumb copy-paste way instead of the git wizardry way).

TomNicholas avatar Feb 08 '24 21:02 TomNicholas

But we should fix the bug here! Because people will still be using this repository for a while yet (as this is what is uploaded to pypi/conda as xarray-datatree)

TomNicholas avatar Feb 08 '24 21:02 TomNicholas

I'm happy to tackle the fix, but will be traveling for a conference that runs through most of next week, so probably wouldn't get to it until after that. If someone else wants to fix it before then, by all means ;)

marcel-goldschen-ohm avatar Feb 08 '24 21:02 marcel-goldschen-ohm

What should be the expected behaviour when renaming a child node to None?

I had a look at how xarray behaves when renaming a DataArray inside of a Dataset. It seems that the renaming is just ignored when trying to change the name property of the DataArray directly:

import xarray as xr

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.name.html

xds = xr.Dataset({"a": xr.DataArray([1])})
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xds["a"].name = "toto"
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xda = xds["a"]
xda.name = "toto"
print(xda)
<xarray.DataArray 'toto' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1

etienneschalk avatar Feb 17 '24 11:02 etienneschalk

@etienneschalk, I find that to be very counterintuitive behavior. My naive expectation would be that the variable should be renamed as desired and the dataset updated to reflect that, and if there was any issue (like renaming to None or to the name of another variable) an exception would be raised. Of course, this is an xarray issue.

marcel-goldschen-ohm avatar Mar 01 '24 16:03 marcel-goldschen-ohm

Closing in favour of the discussion upstream in https://github.com/pydata/xarray/issues/9447

TomNicholas avatar Sep 09 '24 18:09 TomNicholas