datatree
datatree copied to clipboard
Implement dask-specific methods
This is an initial implementation of the feature requested in #97.
The first implementation here very closely follows the implementation of these methods by xarray.Dataset
. For the majority of the methods, this should work fine; we iterate over all the nodes in our tree, starting at the root, and perform the necessary dask.collections
API operation. However, __dask_post{compute,persist}__
is a bit more complicated; some additional testing is required to ensure that we're appropriately applying the available support utilities to re-construct our final DataTree
without any superfluous work.
- [x] Closes #97
- [ ] Tests added
- [ ] Passes
pre-commit run --all-files
- [ ] New functions/methods are listed in
api.rst
- [ ] Changes are summarized in
docs/source/whats-new.rst
Tag @TomNicholas, will work on testing this over the coming days as I have time.
Here's a gist based on @jbusecke's CMIP6 demo showing the top-level integration of load
and compute
(you can just easily modify it to show that persist
works.
Still left to do are writing some test cases and further deep-diving to make sure that the dask
collections API functions we provided here are used.
Thanks for the quick review @TomNicholas, hoping to address later today or tomorrow. Note on the line repetition - looks like I screwed up a merge somewhere, will need to fix that separately.
@darothen wondering if you had any time soon to revisit this PR? Would be great to get it in soon because Julius and I are writing another blog post about using datatree with dask on CMIP6 data.
@TomNicholas I'm hacking on some projects this weekend, let me see if I can wrap things up. Apologies for the delay... it became very hectic at work shortly after the hackathon and I haven't had much time for side projects.