iris Handling of Standard name aliases

✨ Feature Request

When Iris is installed the file lib/std_names.py is generated from the xml version of the standard name table. In this process all aliased standard names are "promoted" to become on par with the all other standard names.

Wouldn't it be more appropriate to have them replaced by the corresponding new standard names? I.e. that Iris can read/understand the aliased standard names, but translating them to the new ones which then are used in further processing and when writing data.

Motivation

To my understanding the idea behind aliasing a standard name is that the new standard name better/more precisely describes the quantity at hand, perhaps even taking new research results into account.

Apr 18 '23 15:04 larsbarring

C.f. #5255, which mentions the creation of lib/std_names.py during Iris installation.

Apr 18 '23 16:04 larsbarring

Hey @larsbarring thanks for raising this.

We need to have a think about the implications of what you're suggesting, so leave it with us and hopefully we'll get back to you asap 👍

Apr 19 '23 09:04 bjlittle

Replacing the names outright without user intervention seems like an extreme approach. More prefereable would be a utility which promotes these names and perhaps a warning when such names are loaded to suggest this to a user. Does this seem like a reasonable approach?

Apr 26 '23 09:04 stephenworsley

I agree :+1: Either a utility, or perhaps a kwarg to the iris.load functions: something like

iris.load(..., std_name_aliases= {"warn" | "keep" | "replace"} , ...)

Apr 26 '23 10:04 larsbarring

I'd say that the key place to issue warnings would be on save We don't have to care that much about 'correct' use of standard-names in data loaded into Iris, since Iris doesn't interpret them for almost any purposes. But we do try to employ best practice when we write output files.

Apr 26 '23 12:04 pp-mo

Well, iris as such might not use the standard name much, but as users of iris api we might be interested in this and appreciate not having to deal with alias (that might evolve over time) or get an warning that if one is aiiased. But I admit that this is somewhat speculative/forward-looking as I do no have an immediate hands-on use case.

Apr 26 '23 14:04 larsbarring

Just by complete chance, when looking for something else, I came across this use case for handling aliases in a more sophisicated way than what is now the case.

May 04 '23 16:05 larsbarring

@SciTools/peloton considering what is possible, according to the statement here in the CF FAQ

We may need to be pragmatic, since the suggestion is that older aliases can be ambiguous, in which case an automatic translation is simply not always possible. But a quick inspection of the table does seem to indicate that it might be practicable, in most/many cases.

Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade. ( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )

Another thought : in keeping with recent decisions, we might prefer to control loading via a content manager than add keywords (but it's only a style thing).

In view of that, and above comments, can you come up with an implementation proposal for enhanced load and save behaviours (or utility) @larsbarring ?

May 10 '23 09:05 pp-mo

In fact I have been playing with some ideas during the last couple of days (as always very much limited by pretty basic [pun intended] coding skills):

Reorganising the iris.std_names.py a bit to have a separate dict for aliases (done via an updated tools/generate_std_names.py). It now includes some table version information, and a separate dict for the standard name descriptions (optional when generated).
Adding a new std_name_table.py containing the following functions: get_convention -- return a tentative Conventions string set_alias_processing -- define how to handle aliases ("keep" - current behaviour, "warn" - warn and update, "replace" - silently update)
get_description -- return the standard name description
check_valid_standard_name -- check if a name is a standard name or an alias, and do the translation if requested

std_name_table is naively imported in iris.__init__ and std_name_table.check_valid_std_name is called in common.mixin.py. From a design point of view I think that this as far as I can reach. If you think these ideas, which are rather un-pythonic and un-anything, are worth considering I would need some support for taking it further. E.g. context managers are beyond my level...

May 10 '23 10:05 larsbarring

And when again reading your comment @pp-mo, I think that you hit the nail when writing

Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade. ( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )

I totally agree and this was my motivation for asking for standard name version information (#5255). And regarding upgrading the standard name table, I was asking myself whether this could be done more dynamically. I.e. if there was an iris.util.get_new CF_standard_name_table that would basically do what now is done during setup?

May 10 '23 12:05 larsbarring

If we had the dictionary of aliases, it would be pretty easy to provide a callback function that renames the cubes. If the dictionary was public then users could use it to create their own callback functions, which would give maximum flexibility.

May 10 '23 16:05 rcomer

OK, I bite the bullet and have just made a POC PR (#5313). The dictionary @rcomer is asking for is available as iris.std_names.ALIASES

May 11 '23 10:05 larsbarring