iris
iris copied to clipboard
Cube list concatenation oddity
Hey guys, hope yous doing well and healthy and working from home :grin:
I got a question described in this comment and it seems to me there has to be some sort of check in iris
for such cases. here goes:
- function used:
iris.cube.CubeList(cubes).concatenate()
- used case: a cube list like
[A, B, C, D]
where A spans over a time t, B continues from where A stopped and spans for another time t, C starts a bit before B finishes and spans for another time t, D starts where C stopped and spans for another time t (t is arbitrary, what matters is where cubes start and end on the time axis) - result of concatenation: A, B and D will be concatenated into a composite M, C will not be concatenated so the result of
iris.cube.CubeList([A, B, C, D]).concatenate()
is[M=(A+B+D), C]
This is pretty bad since there is a serious time gap between end of B and start of D (where C should be, partially) -> is this something that you would consider writing a warning or a fix for? It's particularly difficult to notice such gapey concatenations when dealing with, say, multiple CMIP experiments to be concatenated into one.
If you guys say meh, then we'll plug a fix in our concatenation wrapper, but I just wanted to alert you on this - :beer:
btw I just gave you a star so now you have to answer :grinning: :beer:
Hi @valeriupredoi, sorry for the delay getting back to you on this.
In terms of what Iris should do here, what's the ideal behaviour for you? My go-to list of options (below) mostly have big enough downsides that I don't think we'd want to implement them. The behaviour is currently expected to "warn" (I realise it's a cryptic warning) by returning you (from your example) [M, C]
which will be a longer list than your intended result ([M]
) and thus indicate that something went wrong/didn't "complete".
My options and their downsides:
- Drop spare data (silently dropping data seems bad, we'd have to prioritise some data over other data)
- Track some form of "maximum gappiness" in a concatenated cube and warn if it's too high (this seems a bit too close to attempting magic IMO as reading user's minds is hard)
- Raise an exception (breaks existing code of users who are using the "get multiple cubes out" functionality)
- Warn that concatenation wasn't "completed" (not got so much against this actually, though introducing warnings to code people think is working fine might worry them a bit)
- Make a new
concatenate_to_one
(name TBC) method for CubeLists (I actually quite like this - it would do forconcatenate
whatload_cube
does forload
orextract_cube
does forextract
...)
Let me know what you think!
We already have concatenate_cube.
cheers muchly @wjbenfold for your reply and being willing to improve the functionality - I have just opened an issue where we may discuss this from our end's point, it may as well be that people are perfectly fine with the current functionality so we'll close this, but let's give my fellow devs a couple days see if there are any suggestions/complaints :grin: Many thanks to you guys for being well responsive to our input BTW :beer:
An instance of this just came up on MO Yammer. In that particular case, the overlapping period had equal data, metadata, etc. So in that case Iris might reasonably be expected to throw away the duplicate part and concatenate anyway. How easy that would be to implement I've no idea!
How easy that would be to implement I've no idea!
My immediate thought is that we'd at least have to realise the data to do that check, which might be enough to rule it out without at least a special mode that users have actively chosen?
I notice that over at https://github.com/ESMValGroup/ESMValCore/issues/1423, @zklaus has argued that such functionality should be added in downstream packages to address specific use-cases.
We could maybe improve the error message from concatenate_cube
though.
import iris
fname = iris.sample_data_path("SOI_Darwin.nc")
complete_cube = iris.load_cube(fname)
overlapping_parts = iris.cube.CubeList([complete_cube[:100], complete_cube[50:]])
overlapping_parts.concatenate_cube()
ConcatenateError: failed to concatenate into a single cube.
An unexpected problem prevented concatenation.
Expected only a single cube, found 2.
What do you reckon would be more helpful @rcomer? Given that a failure of the cubes to all look the same will, from my understanding, fail inside concatenate because of error_on_mismatch=True
, maybe we can go for
ConcatenateError: failed to concatenate into a single cube.
The cubes provided may overlap.
Expected only a single cube, found 2.
If that seems sensible then I can put up the PR to make it happen (or you can and I'll review it)
Looking back at this issue, if there's future appetite for functionality that handles overlap, I can imagine an argument to concatenate_cube
(or an additional function) that prioritises cubes that come earlier in the cubeList
and drops data from later cubes in the list if it was already provided earlier in the list.
I think match
becomes False
here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only if error_on_mismatch
is True
of course.)
I think
match
becomesFalse
here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only iferror_on_mismatch
isTrue
of course.)
We could add descriptive errors all the way through that function (_ProtoCube.register
) I think?
It looks like _sequence
will also have a problem with overlapping bounds so we'd probably want to mention that too
Put this in Iris 3.7 , but can't commit to resolving it : we will look into it now
Closed by #5382