xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Creating a DataTree should not modify parent and children in-place

Open shoyer opened this issue 1 year ago • 1 comments

What is your issue?

This violates the well-established Python convention that functions either have a return value or modify arguments in-place. They never do both.

Examples:

  • Modifying a parent:
from xarray.core.datatree import DataTree

root = DataTree()
child = DataTree(name='child', parent=root)
print(root)
# <xarray.DataTree>
# Group: /
# └── Group: /child
  • Modifying children:
from xarray.core.datatree import DataTree

child = DataTree()
root = DataTree(children={'child': child})
print(child)
# <xarray.DataTree 'child'>
# Group: /child

This particularly surprising if a DataTree argument is reused, e.g.,

from xarray.core.datatree import DataTree

child = DataTree()
root = DataTree(children={'child': child})
root2 = DataTree(children={'child2': child})
print(child)  # attached to root2
# <xarray.DataTree 'child2'>
# Group: /child2
print(root)  # now empty!
# <xarray.DataTree>
# Group: /

Here's my suggestion:

  1. We should make the DataTree constructor make shallow copies of its DataTree arguments.
  2. We should consider getting rid of the parent constructor argument. It is redundant with children, and unlike children, parent nodes don't show up in the DataTree repr, so it's more surprising to see them copied.

CC @TomNicholas

shoyer avatar Jul 01 '24 00:07 shoyer

Good point, this wasn't really an intentional breaking of that convention on my part.

Happy with either / both of your suggestions.

I think I originally put parent in the constructor just so that all attributes were present in the signature, but there are multiple other ways to set the parent once the subtree has been created.

TomNicholas avatar Jul 01 '24 16:07 TomNicholas

To be clear the aim here is to do both of @shoyer 's suggestions, probably in 2 separate PRs

TomNicholas avatar Jul 09 '24 15:07 TomNicholas

I'm just finding this issue now. I Don't know why I didn't see it. I assume, this is going to make a bunch of changes required in the docs. Can you summarize what I'll need to look for? e.g I assume this isn't true any longer.

homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})
The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the [siblings](https://github.com/pydata/xarray/generated/xarray.DataTree.siblings.html#xarray.DataTree.siblings) property:

list(bart.siblings)
Out[5]: ['Lisa']

That assignment to homer will no longer make bart's sibling lisa? What am I missing?

flamingbear avatar Jul 31 '24 22:07 flamingbear

e.g I assume this isn't true any longer.

homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})
The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the [siblings](https://github.com/pydata/xarray/generated/xarray.DataTree.siblings.html#xarray.DataTree.siblings) property:

list(bart.siblings)
Out[5]: ['Lisa']

That assignment to homer will no longer make bart's sibling lisa? What am I missing?

Yes, you have this correct.

homer.children['Bart'] will have the sibling Lisa, but not the original bart object.

shoyer avatar Jul 31 '24 22:07 shoyer

This should be closed by #9297.

TomNicholas avatar Sep 07 '24 18:09 TomNicholas