datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Merge Datatree siblings when they are compatible

Open jonas-spaeth opened this issue 2 years ago • 1 comments

Hey,

I've come across datatree only recently, but it I already see many use-cases in my work, so thanks for the effort!

A minimal example of what I'd like to have:

  • I have data from two models (say 'us' and 'eu'), and they have different x and y coordinates → therefore I store them as siblings in a datatree dt (structure: "forcasts/us", "forcasts/eu") instead of a single Dataset
  • After some manipulation, both Datatrees are actually compatible, e.g., because I averaged over x and y
  • for further analyses, I'd like to concat the two Datatrees in a single Dataset along the dimension model, something like xr.concat([dt["us"].ds, dt["eu"].ds], dim="model").assign_coords(model=["us", "eu"]) (for example, I could then use dt['forecasts'].ds.t2m.plot(hue='model'))
  • it would be nice to allow such an operation, e.g., via dt.concat_leaves(), where the result is a datatree "forecasts"

A very similar use case would be:

  • I have forecast runs where the resolution changes after day 15 of the integration
  • I could store them as a datatree ("forecast/short-range", "forecast/medium-range")
  • after doing some manipulation, e.g., spatial averaging, both forecast ranges could be compatible, and one sibling stores leadtime days 0-14 and one stores days 15-46
  • it would be cool to have again something like dt.merge_leaves() to have a new dataset with a continuous leadtime

If something like that is already possible I apologize for my ignorance.

Again, thanks for putting this together.

Cheers, Jonas

jonas-spaeth avatar Jan 19 '23 13:01 jonas-spaeth

see #192 for some discussion on collapsing subtrees in general

keewis avatar Jan 19 '23 15:01 keewis

Closed in favor of #192

flamingbear avatar Aug 13 '24 16:08 flamingbear