iris icon indicating copy to clipboard operation
iris copied to clipboard

Case insensitive lenient special name support?

Open bouweandela opened this issue 4 years ago • 10 comments

✨ Feature Request

Motivation

It was recently reported to me by @axel-lauer that he experienced frustration with using the iris library via ESMValTool because it failed to merge some cubes because some had a longitude coordinate with long_name "Longitude" (the CMIP6 standard), while others had a longitude coordinate with long_name "longitude" (the CMIP5 standard).

Would this already be solved by #4446 or would some additional changes be needed, such as changing special lenient name behaviour such that is case insensitive (if that doesn't conflict with the CF Conventions)?

bouweandela avatar Dec 07 '21 10:12 bouweandela

Hi @bouweandela, would those coordinates also have standard_names? If so, I think the lenient comparison would ignore the long_name clash.

rcomer avatar Dec 07 '21 12:12 rcomer

Yes, they would have standard_names, so that sounds promising!

bouweandela avatar Dec 07 '21 13:12 bouweandela

Sorry, finger slipped!

rcomer avatar Dec 07 '21 21:12 rcomer

It seems I had misremembered in the above. The mismatched long_names do cause an error for lenient arithmetic (with Iris3.1):

import iris
import iris.cube
import iris.coords

coord1 = iris.coords.DimCoord(range(5), standard_name="longitude", long_name="Longitude")
coord2 = iris.coords.DimCoord(range(5), standard_name="longitude", long_name="longitude")

cube1 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord1, 0)])
cube2 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord2, 0)])

cube1 - cube2
ValueError: Insufficient matching coordinate metadata to resolve cubes, cannot map dimension (0,) of the RHS cube 'unknown' to the LHS cube 'unknown'.

You can have mismatched var_names though:

coord1.var_name = "foo"
coord3 = coord1.copy()
coord3.var_name = "bar"

cube3 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord3, 0)])

print(cube1 - cube3)
unknown / (unknown)                 (longitude: 5)
    Dimension coordinates:
        longitude                             x

rcomer avatar Dec 15 '21 11:12 rcomer

For equality, lenient behaviour is really only your friend in the cases outlined here i.e., when you're comparing two things, and one thing has a value and the other not (i.e., "something" vs None)

The only other special treatment is with regards to var_name (currently).

That all said, for strict/lenient equality a long_name="latitude" compared with long_name="Latitude" will always be False, as you're comparing "something" vs "something".

Yes, we could say, well for lenient behaviours make str equality case insensitive, which is ideal to adhere to the CMIP6 standard, but on the flip side it then breaks someone else's workflow where they do actually care about long_name="longitude" vs long_name="LONGITUDE" being different (don't know why, but they might)... unfortunately, there is just no way for us to know this; it's incredibly local and specific to users, their data and their workflow i.e., different people care about different things (and that's totally okay).

IMHO adding baked in local exceptions to iris really is a slippy slope from a maintenance perspective, and I'd be keen to avoid it.

In summary, without thinking too deeply about this, the obvious options are:

  1. either iris opens up the ability for users to define generic behavious to be applied during lenient operations at the API level,
  2. users align metadata values beforehand in the knowledge of how iris strict and lenient operations behave, or
  3. allow users that want "special" lenient treatment, a hook to monkey-patch the local behaviour that they want and need for their workflow

Personally, knowing what I know, I'm attracted to option [3] from the perspective that it's completely achievable with very little development effort. This is compared to option [1] which is a HUGE development undertaking, and option [2] which is a HUGE inconvenience for users.

Also, going back to your question @bouweandela, as it stands, no #4446 wouldn't help you alone here... however, it certainly would help you along with option [3].

bjlittle avatar Dec 15 '21 13:12 bjlittle

Okay, I'm going to proof-of-concept option [3] (above) in the context of your use case and see how it hangs together.

This'll hopefully give us a concrete strawman (and hopefully proposal) of how we might be able to move forwards with this issue.

bjlittle avatar Dec 15 '21 13:12 bjlittle

I like the idea of option 3 from the user POV too. Could it work similarly to the FUTURE flag? I.e. I can set it once at the top of my script so it modifies all behaviour in my script, or I can choose to use it as a context manager to just modify the behaviour in this one section.

rcomer avatar Dec 15 '21 13:12 rcomer

I like the idea of option 3 from the user POV too. Could it work similarly to the FUTURE flag? I.e. I can set it once at the top of my script so it modifies all behaviour in my script, or I can choose to use it as a context manager to just modify the behaviour in this one section.

That would be lovely, right? :thinking:

I think there are at least two threads here:

  • does the option [3] mechanism work, and is it palatable to both developers and users alike
  • how will it be used, and how do we best exercise control of when these "special" behaviours kick-in, so to speak

bjlittle avatar Dec 15 '21 13:12 bjlittle

Thanks for looking into this @bjlittle and @rcomer and my apologies for being slow to reply. It sounds like option 3 is as good as it gets. I personally feel that being super strict on things is what scares scientists off from using iris, but maybe the iris library is not the place to fix that and we need we need libraries like esmvalcore to carry out all the homogenising of metadata before passing things on to iris. I do try to report the more generic things back to iris, that's why i opened this issue.

Another example of super strict behaviour that only makes sense to programmers and not to climate scientists is refusing to merge floating point coordinates that only differ by 1 ULP.

bouweandela avatar Feb 14 '22 20:02 bouweandela

Thinking since more about option 3: does having this option simplify the user's code in practice? It is fairly straightforward to write code to homogenise the capitalisation of coordinate names, it's just a bit unexpected that this is needed. Do the CF conventions have anything to say on whether or not variable names are case sensitive?

bouweandela avatar Feb 14 '22 20:02 bouweandela

We've discussed this as a team @SciTools/peloton and feel this is a case for user workaround. This is beyond the intended scope of lenient operations.

ESadek-MO avatar May 24 '23 09:05 ESadek-MO