iris icon indicating copy to clipboard operation
iris copied to clipboard

Trial `iris.common.resolve` in `CubeList.merge()`

Open trexfeathers opened this issue 2 years ago • 1 comments

resolve is intended to develop into a single-source-of-truth for handling Cube equality, difference and combination. It is currently (2023-07-27) only used for Cube maths.

A big part of CubeList.merge() is analysing/resolving differences between the Cubes to be merged. This is a prime case for instead using resolve.

Benefits

  • Bring Iris closer to the single-source-of-truth - easier to discuss, explain, plan future development.
  • Enables the spread of Leniency (#4446) to other parts of the code-base in a consistent way - current application in Maths is via resolve.

Steps to complete

  • [ ] Experimentally replace the current resolution code of merge() with call(s) to iris.common.resolve.
  • [ ] Run the Iris tests, record failures in this issue
  • [ ] Run the Iris benchmarks, record regressions vs main branch in this issue
  • [ ] Write an on-demand benchmark, modeled on this existing benchmark but with more extreme scaling (100 Cubes?). Run and record that, too!
  • [ ] Discuss with other core developers (2023-07-27: we are trialling Dragon Taming meetings).
    • Does resolve prevent/enable merges that were previously enabled/prevented? Can we defend these changes? Do we need to adapt resolve to get closer to the existing merge() behaviour?
    • Is performance acceptable versus the existing behaviour? Do we need to adapt resolve to better scale with >2 Cubes?
    • After these concerns, what is left to make resolve-in-merge a reality? Testing changes? Documentation?
  • [ ] Discuss the changes to the API to control lenient and strict behaviour
  • [ ] Decide on the default behaviour, with breaking change implications to be considered
  • [ ] Write up the agreed next steps as further issues in the tasklist here: #4446. Small, achievable tasks: each task should be complete-able (write and review) in a maximum of 2 weeks.

Most relevant tests

https://github.com/SciTools/iris/blob/3c1b3e6993ba1283b5630ea24ee450d2bd7f246f/lib/iris/tests/test_merge.py#L6-L9

https://github.com/SciTools/iris/blob/3c1b3e6993ba1283b5630ea24ee450d2bd7f246f/lib/iris/tests/integration/merge/test_merge.py#L6-L9

https://github.com/SciTools/iris/blob/3c1b3e6993ba1283b5630ea24ee450d2bd7f246f/lib/iris/tests/unit/cube/test_CubeList.py#L215

Possibly some loading tests too, since merge is used during the loading process.

Most relevant benchmarks

https://github.com/SciTools/iris/blob/68eaa535debcbcc0462416cd23e98e82548711a5/benchmarks/benchmarks/cube.py#L206

trexfeathers avatar Jul 27 '23 12:07 trexfeathers

Notes from team conversation

  • Resolve includes compromises to please specific power-users who wanted it as a tool, to combine with post-fixes. So it often just 'makes things work'. This is inappropriate for merge, because it is used during load, and will magically change things so the cube no longer represents what is in the file
  • Example of a compromise: willingly combining cubes with different standard names, which effectively means a different identity
  • Resolve is working hard to find commonalities, rather than spotting differences, which may explain the poor scalability
  • So resolve is not generalisable, but there is some appetite for finding the generalisable bits of resolve and abstracting them out
    • Doing this will be a big project
    • Getting it right will need meaningful use cases for both cube arithmetic and merge (and load?)
    • We will need a full understanding of the decisions that resolve makes, and the decisions that merge makes, and the flow that they go through
  • Need to feed this information back into future dragon taming sessions

trexfeathers avatar Feb 21 '24 11:02 trexfeathers