`parent` of new nodes is not assigned properly in `update`
there's something weird happening when assigning nodes using update:
In [1]: import datatree
...: import xarray as xr
...:
...: tree1 = datatree.DataTree.from_dict({"/a/b": xr.Dataset()})
...: node = datatree.DataTree()
...:
...: tree2 = tree1.copy()
...: tree2.update({"c": node})
...:
...: for node in tree3.subtree:
...: print(node.path, node.name, id(node.root))
/ None 140512169735136
/a a 140512169735136
/a/b b 140512169735136
/ c 140512169735248
(the parent of node is None, so its root is the node itself)
So it seems with update / assign you can end up with a inconsistent tree? With #201, that inconsistent tree fails to copy because relative_to raises an error.
some more issues I encountered:
- overwriting existing nodes fails, because the code is using the constructor of
OrderedDictto merge together two dictionaries (which is aTypeErrorif there's duplicate keys): https://github.com/xarray-contrib/datatree/blob/72d31f2fc62f38060c1adee748fd98fa3dc04465/datatree/datatree.py#L813 To fix that, we can useOrderedDict({**d1, **d2}). - assigning paths is not possible because the key is assigned as the name. We can avoid the error by assigning just the last component of the path:
new_child.name = NodePath(k).name. However, then the path would have to actually be used, which would make setting the parent to fix the inconsistent tree state a bit more complicated. Maybe that code can be shared with__setitem__/_set_item?
Oh dear - thanks for raising this @keewis .
To fix that, we can use OrderedDict({**d1, **d2}).
That seems reasonable.
Maybe that code can be shared with
__setitem__/_set_item?
In general I think that right now there are two public codepaths that change nodes: 90% of methods eventually go through _set_item internally, and .update is the only exception. The implementation difference was to do with update being an in-place method, and in-place methods are very rare in xarray. If we can find a way to make everything go through one code path it would help avoid these types of bugs. For example I think the code that checks if a change has made a tree inconsistent is possibly only triggered by _set_item currently.
Actually on 2., I'm not sure we should allow assigning by path in .update. DataTree has this fundamental ambiguity between "a node is a single flat dict mapping keys to immediate children" and "a node is a nested thing mapping path-like keys to every node in the tree".
What I've tried to do is make the standard dict-like methods (i.e. .keys, .values, .items, .get, __contains__, __iter__, __len__) refer to only immediate children (and variables) of this node. That's why I put them in a special section in the API docs. (Yes the __getitem__ and __setitem__ methods accept paths, but that was meant to be the convenient exception to the rule. :sweat_smile: ) I didn't support update-ing with paths to other nodes because I wanted to follow this flat-dict-like pattern.
closed in favor pydata/xarray#9345