iris icon indicating copy to clipboard operation
iris copied to clipboard

Cube list concatenation oddity

Open valeriupredoi opened this issue 4 years ago • 11 comments

Hey guys, hope yous doing well and healthy and working from home :grin: I got a question described in this comment and it seems to me there has to be some sort of check in iris for such cases. here goes:

  • function used: iris.cube.CubeList(cubes).concatenate()
  • used case: a cube list like [A, B, C, D] where A spans over a time t, B continues from where A stopped and spans for another time t, C starts a bit before B finishes and spans for another time t, D starts where C stopped and spans for another time t (t is arbitrary, what matters is where cubes start and end on the time axis)
  • result of concatenation: A, B and D will be concatenated into a composite M, C will not be concatenated so the result of iris.cube.CubeList([A, B, C, D]).concatenate() is [M=(A+B+D), C]

This is pretty bad since there is a serious time gap between end of B and start of D (where C should be, partially) -> is this something that you would consider writing a warning or a fix for? It's particularly difficult to notice such gapey concatenations when dealing with, say, multiple CMIP experiments to be concatenated into one.

If you guys say meh, then we'll plug a fix in our concatenation wrapper, but I just wanted to alert you on this - :beer:

valeriupredoi avatar Apr 06 '20 17:04 valeriupredoi

btw I just gave you a star so now you have to answer :grinning: :beer:

valeriupredoi avatar Apr 06 '20 17:04 valeriupredoi

Hi @valeriupredoi, sorry for the delay getting back to you on this.

In terms of what Iris should do here, what's the ideal behaviour for you? My go-to list of options (below) mostly have big enough downsides that I don't think we'd want to implement them. The behaviour is currently expected to "warn" (I realise it's a cryptic warning) by returning you (from your example) [M, C] which will be a longer list than your intended result ([M]) and thus indicate that something went wrong/didn't "complete".

My options and their downsides:

  • Drop spare data (silently dropping data seems bad, we'd have to prioritise some data over other data)
  • Track some form of "maximum gappiness" in a concatenated cube and warn if it's too high (this seems a bit too close to attempting magic IMO as reading user's minds is hard)
  • Raise an exception (breaks existing code of users who are using the "get multiple cubes out" functionality)
  • Warn that concatenation wasn't "completed" (not got so much against this actually, though introducing warnings to code people think is working fine might worry them a bit)
  • Make a new concatenate_to_one (name TBC) method for CubeLists (I actually quite like this - it would do for concatenate what load_cube does for load or extract_cube does for extract...)

Let me know what you think!

wjbenfold avatar Jan 12 '22 16:01 wjbenfold

We already have concatenate_cube.

rcomer avatar Jan 12 '22 21:01 rcomer

cheers muchly @wjbenfold for your reply and being willing to improve the functionality - I have just opened an issue where we may discuss this from our end's point, it may as well be that people are perfectly fine with the current functionality so we'll close this, but let's give my fellow devs a couple days see if there are any suggestions/complaints :grin: Many thanks to you guys for being well responsive to our input BTW :beer:

valeriupredoi avatar Jan 17 '22 12:01 valeriupredoi

An instance of this just came up on MO Yammer. In that particular case, the overlapping period had equal data, metadata, etc. So in that case Iris might reasonably be expected to throw away the duplicate part and concatenate anyway. How easy that would be to implement I've no idea!

rcomer avatar Mar 11 '22 15:03 rcomer

How easy that would be to implement I've no idea!

My immediate thought is that we'd at least have to realise the data to do that check, which might be enough to rule it out without at least a special mode that users have actively chosen?

wjbenfold avatar Mar 11 '22 17:03 wjbenfold

I notice that over at https://github.com/ESMValGroup/ESMValCore/issues/1423, @zklaus has argued that such functionality should be added in downstream packages to address specific use-cases.

We could maybe improve the error message from concatenate_cube though.

import iris

fname = iris.sample_data_path("SOI_Darwin.nc")
complete_cube = iris.load_cube(fname)

overlapping_parts = iris.cube.CubeList([complete_cube[:100], complete_cube[50:]])
overlapping_parts.concatenate_cube()
ConcatenateError: failed to concatenate into a single cube.
  An unexpected problem prevented concatenation.
  Expected only a single cube, found 2.

rcomer avatar Mar 14 '22 11:03 rcomer

What do you reckon would be more helpful @rcomer? Given that a failure of the cubes to all look the same will, from my understanding, fail inside concatenate because of error_on_mismatch=True, maybe we can go for

ConcatenateError: failed to concatenate into a single cube.
  The cubes provided may overlap.
  Expected only a single cube, found 2.

If that seems sensible then I can put up the PR to make it happen (or you can and I'll review it)

wjbenfold avatar Mar 14 '22 11:03 wjbenfold

Looking back at this issue, if there's future appetite for functionality that handles overlap, I can imagine an argument to concatenate_cube (or an additional function) that prioritises cubes that come earlier in the cubeList and drops data from later cubes in the list if it was already provided earlier in the list.

wjbenfold avatar Mar 14 '22 12:03 wjbenfold

I think match becomes False here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only if error_on_mismatch is True of course.)

rcomer avatar Mar 14 '22 20:03 rcomer

I think match becomes False here if the coordinates are overlapping. If I’ve got that right, could we raise an exception there to definitively say so? (edit: only if error_on_mismatch is True of course.)

We could add descriptive errors all the way through that function (_ProtoCube.register) I think?

It looks like _sequence will also have a problem with overlapping bounds so we'd probably want to mention that too

wjbenfold avatar Mar 15 '22 09:03 wjbenfold

Put this in Iris 3.7 , but can't commit to resolving it : we will look into it now

pp-mo avatar Jun 27 '23 10:06 pp-mo

Closed by #5382

trexfeathers avatar Jul 26 '23 10:07 trexfeathers