datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Indexing tree should create new tree

Open TomNicholas opened this issue 2 years ago • 1 comments

Inspired by this example in the stackstac documentation

lowcloud = stack[stack["eo:cloud_cover"] < 20]

we should ensure that you can index a datatree with another (isomorphic) datatree, so that the above operation would work even if stack is a DataTree instance.

This is another map_over_subtree-type operation, but it needs careful testing because the __getitem__ function in xarray objects already does so many different things. This won't work with the code as-is because at the moment the DataTree naively dispatches the __getitem__ call down to the wrapped dataset.

https://github.com/xarray-contrib/datatree/blob/cd0695160e261466efc7f51fece02ca9bea2101c/datatree/datatree.py#L238

TomNicholas avatar Apr 22 '22 15:04 TomNicholas

To clarify, in order for this to work several things need to happen:

  1. stack["eo:cloud_cover"] needs to realise that "eo:cloud_cover" is not a tree, not a group in the tree, but a variable name. Then it needs to select the "eo:cloud_cover" variable from all nodes in the subtree, and return a tree containing only those variables. That in itself requires something like #67 but ignoring nodes for which that variable is not present, at least for deeply-nested trees...
  2. stack["eo:cloud_cover"] < 20 needs to perform this comparison node-wise, returning a tree of results (hopefully this should already work...
  3. stack[stack["eo:cloud_cover"] < 20] needs to use the tree passed to perform a node-wise indexing operation, returning a new tree. (Or we could just .where)

Basically this is a really complicated usage example because it uses multiple different code-paths within __getitem__ sequentially within one line of user code :sweat_smile:

TomNicholas avatar Apr 22 '22 16:04 TomNicholas