Support multiple items in DataTree.__getitem__ and improve NodePath (renamed to TreePath)
This PR adds support for indexing with multiple items as a list of paths in DataTree.__getitem__, e.g., tree[['first', 'second']].
It also includes internal improvements to NodePath (now renamed to TreePath):
- Rename
NodePathtoTreePathto make its name slightly more obvious - Automatically normalize paths in the
TreePathconstructor - Use
joinpath()and normalized tree paths to simplify implementations of_get_itemand_set_item.
looks like there was a similar attempt in #10400, in case it helps
According to our policy, we can drop python=3.11 from 2026-04-04 onwards – you can simulate this by passing today to minimum_versions:
python minimum_versions.py --policy ci/policy.yaml --today 2026-04-04 ci/requirements/min-all-deps.yml
This is ready for review.
The main thing this could use is clear documentation, to explain that in the case of indexing multiple keys, the resulting DataTree is always defined relative to the node being indexed. This is rather different from the API proposed in https://github.com/pydata/xarray/pull/10400, which tries to index the selected variables at each node.
Ideally we could supply this functionality in a dedicated method (which would also make it easier to document), e.g., DataTree.subset() as we discussed last week at the Xarray meeting. This could be similar to the existing discussion about adding a public API for Dataset._copy_listed(): https://github.com/pydata/xarray/issues/3894
cc @eni-awowale
Ideally we could supply this functionality in a dedicated method (which would also make it easier to document), e.g., DataTree.subset() as we discussed last week at the Xarray meeting. This could be similar to the existing discussion about adding a public API for Dataset._copy_listed()
Is the intention here that
a. DataTree.subset() and DataTree.__getitem__(list) do the same thing (in both the case that the entries in the list refer to variables and the case that they refer to nodes)
b. We only have DataTree.subset()
c. We have both but there is some difference in behaviour between them
Is the intention here that
a.
DataTree.subset()andDataTree.__getitem__(list)do the same thing (in both the case that the entries in the list refer to variables and the case that they refer to nodes)b. We only have
DataTree.subset()c. We have both but there is some difference in behaviour between them
Yes, I was thinking option (a), both for DataTree and eventually Dataset. subset() is more discoverable for new users, but __getitem__ is what users would expect based on longstanding Dataset behavior.
We could put this functionality only on subset() but I don't see much downside in duplicating it, with __getitem__ as the convenience API. That's pretty standard with how we use it elsewhere in Xarray.
Quickly summarizing what was discussed in today's meeting, future .subset method should be able to subset given a new path and be able to update the name of the tree.