xarray icon indicating copy to clipboard operation
xarray copied to clipboard

generate datatree methods

Open mathause opened this issue 10 months ago • 2 comments

  • [x] Closes #10015
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Add a script that generates a mixin so Dataset methods are available on DataTree. Uses inspect.signature to re-generate the call signature and a decorator so we can still use *args, **kwargs, and we don't need to populate the method body, making the generation relatively trivial (although maybe not trivial to understand).

This is much clunkier than generate_ops or generate_aggregations. However, we cannot profit from common signatures. Thus

  • the docstring is not adapted
  • the examples are not adapted
  • the generated file needs to be fixed and formatted with ruff before use

However, it's a fraction of the work to do this properly. I am really not sure if this is a good idea - feel free to tell me it's not!

mathause avatar Mar 18 '25 17:03 mathause

The alternative is to inject everything (as in https://github.com/xarray-contrib/datatree/blob/5f3956ffe80e686dd3df54ee8cef9ff56c158e76/datatree/ops.py#L223). (Or to write all methods out, or create mixin classes that work for all data types...)

mathause avatar Mar 18 '25 20:03 mathause

@TomNicholas do you think this has a chance to be considered. If not I happy to close. It's obviously only a fraction of the missing methods - can add them if this is considered.

One alternative is to generate the file once and then manually adapt the docstrings. That would be a bit less work than do everything by hand. (It's quite annoying to always write xr.map_over_datasets(lambda ds: ds.rename(...), dt) instead of dt.rename(...), etc. for many of the dataset manipulations.)

mathause avatar Jun 05 '25 13:06 mathause

Hi @mathause , sorry for the super slow reply here.

This is actually basically how old datatree used to work. But when we moved everything upstream into xarray, we made an effort to do it the "proper" way, by adding to generate_ops.py/generate_aggregations.py. The logic being that Dataset isn't really special once inside xarray - all similar methods should be generated from a common template, of which Dataset is just one realization.

TomNicholas avatar Sep 10 '25 15:09 TomNicholas

Ok thanks - happy to close the PR in this case (although I think these are the methods that are currently not generated in generate_ops.py / generate_aggregations.py. Probably because they cannot be generalized (i.e. contain method-specific logic).)

mathause avatar Sep 10 '25 15:09 mathause