iris
iris copied to clipboard
Time dimension in hybrid height
✨ Feature Request
As described in CF conventions
Motivation
From @matthew-mizielinski. Considering how to represent hybrid height over glaciers, where the orography moves/changes over time.
Additional context
@pp-mo expects problems with FF or PP loading, as it would require a specific sequence of merge steps.
Click to expand this section...
Please add additional verbose information in this section e.g., references, screenshots, listings etc
### Tasks
- [ ] https://github.com/SciTools/iris/issues/6162
- [ ] https://github.com/SciTools/iris/issues/6163
@matthew-mizielinski has confirmed that data like this currently generates 1 Cube for each time point, rather than a single Cube with a time dimension.
It will be problematic for UK Met Office strategy (climate - IPCC) if this misses the 3.11 (October) release.
I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.
I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.
@stephenworsley let us know what you need. If necessary we have a whole team of developers (given the strategic importance of this).
Shout if a discussion on this would be useful -- I'm sure we can come up with a minimal test data set to work with.
@matthew-mizielinski minimal test data would absolutely be appreciated, and yes, I think it would be good to set up a discussion when possible.
One possible idea for resolving the merge issue:
Provide a keyword argument for the merge method which you can pass the name of an AuxCoord or a tuple of coord names. this tells merge which coordinates it ought to expand the dimensions of. Further information is likely to be required in the case where multiple dimensions are being added by merge, perhaps a tuple of dimension names in which to expand for each AuxCoord. This keyword could also be passed down from the load function.
This approach shouldn't break existing functionality and should allow sufficient controll of the merging process. I expect there may be some attention we would need to give to AuxCoordFactorys to make sure they behave sensibly during this process since I'm not aware of any other functions which add a dimension to a coordinate that another coordinate is derived from, but I don't expect this to be too much of a problem.
An alternate approach to explore could involve concatenating instead of merging and using the new_axis utility to expand the dimensions of the orography coordinate appropriately. This ought to be enabled now via #4896, though I'm not sure how this handles derived coordinates.
Some summary points from our offline discussion today (@pp-mo @stephenworsley @matthew-mizielinski )
Usecase example
we investigated a specific usecase which demonstrates the issue here.
- monthly files spread across multiple years, so timepoints are monthly
- each phenomenon (stash) has dimensions (time, model_level, y, x)
- the orography is surface_altitude(time, y, x), and it changes each year (so same on adjacent 12 month points)
We tried loading selected monthly files, e.g.
iris.load(['sep30, 'oct30', 'jan31']) # imaginary monthly files (!)
- The source PP fields (as seen from "load_raw") are of course 2d.
- with normal load (i.e. not 'load_raw'), adjacent months produce a single phenomenon cube with a common 2D orography
- a mixture of years produces a data-cube per year, and a single merged orography cube
- in a normal (i.e. merge-processed) load which spans multiple years (but considering only one phenomenon for now)
- there are multiple data cubes, each containing one year, with ...
- a single scalar timepoint
- an associated 2D orography ("surface_height") aux-coord, (matching the year timepoint)
- a 2D factory coord (not mapped to time)
- a single orography cube, with has a time dimension, merged from all the timepoints
- there are multiple data cubes, each containing one year, with ...
N.B. we have sample test data to demo this
Solutions acceptable to the user
@matthew-mizielinski said, for his expected usage, it should be easy to identify what data suffers from the "missing merge" like this, and potentially add a specific load keyword as a "hint" (as suggested above), or call into a post-load adjustment utility.
Summary of findings regarding the existing code
- we can see why it doesn't "just work", because merge cannot merge factories ...
- ... and in any case, factory references are attached separately to each raw datacube, and always as a single, 2d field, since no merged orography is available at the "raw cube" stage
- however, it appears that concatenate can now "merge factories" : see here
- likewise, the promote_aux_coord_to_dim_coord utility now has the ability to "promote" a set of (user-specified) scalar coords to a length-1 extra dimension : see here
- contrary to @pp-mo prior concerns, the relationship between raw orography fields and data fields is not obscure,
since the orography info is all correctly labelled with timepoints matching the data.
Hence, in the above usecase, orography always loads as a single cube with a "complete" time dimension (unlike the data fields). Therefore it is not absolutely necessary to change the low-level loading mechanisms - we are concerned that re-writing merge (or concatenate) to achieve this automatically would be very involved
- although it seems logically feasible, since all the relevant metadata exists in the loaded data as we have it
- ... however the code is very complex, and some previous attempts to extend it had to be abandoned due to unforeseen changes affecting backwards compatibility
- so, it seems high-risk to propose a major overhaul which could make it even more complicated
- it also seems hard to work out, automatically + in general, which coords should be merged to create an extra factory dimension
- hence, a separate "additional" facility, with user-hint input, seems more likely to succeed
Possible solutions we can envisage
User presentation (API)
- a general, automatic fix to merge operations within loading (but see complexity objections, above)
- or a load (and/or merge/concatenate) keyword to enable the "extra" factory building on load
- or a post-load utility call.
In case (2) we might need to worry about selecting the correct cubes to work with in the 'additional' operation. The general 'load+merge' behaviour can produce multiple cubes where one was expected if there is a small mismatch somewhere : In this case it could be hard to apply the 'additional' operation to the correct subsets. But we can limit the expected results, e.g. only allow it in "load_cubes", where a single cube is expected from applying each provided constraint. Likewise, a user-operated post-merge operation could be specified to work only with "suitable" data expected to produce a single result cube.
Calculation
( ignoring for now the "better general merge" approach + looking for easy wins )
In general , we can solve merge/concat problems of this nature by
- either reducing all data to have a single point in the problem dimension, then merging everything
- or promoting single-point data to get a length-1 dimension, and concatenating everything
In this case, since we observe that concatenate can combine factories while merge cannot, it seems that (2) is probably easiest
So it looks like, a viable proof-of-concept solution could :
- accept a set of input cubes which (the user says) "ought" to merge into a single result, plus, probably, user-hints of which factory/coords to work on
- promote any cubes with scalar time to have a length-1 time dimension, - including the relevant factory and all the aux-coords which are its dependencies
- concatenate, expecting a single cube result