Handling of Standard name aliases
✨ Feature Request
When Iris is installed the file lib/std_names.py is generated from the xml version of the standard name table. In this process all aliased standard names are "promoted" to become on par with the all other standard names.
Wouldn't it be more appropriate to have them replaced by the corresponding new standard names? I.e. that Iris can read/understand the aliased standard names, but translating them to the new ones which then are used in further processing and when writing data.
Motivation
To my understanding the idea behind aliasing a standard name is that the new standard name better/more precisely describes the quantity at hand, perhaps even taking new research results into account.
C.f. #5255, which mentions the creation of lib/std_names.py during Iris installation.
Hey @larsbarring thanks for raising this.
We need to have a think about the implications of what you're suggesting, so leave it with us and hopefully we'll get back to you asap 👍
Replacing the names outright without user intervention seems like an extreme approach. More prefereable would be a utility which promotes these names and perhaps a warning when such names are loaded to suggest this to a user. Does this seem like a reasonable approach?
I agree :+1: Either a utility, or perhaps a kwarg to the iris.load functions: something like
iris.load(..., std_name_aliases= {"warn" | "keep" | "replace"} , ...)
I'd say that the key place to issue warnings would be on save We don't have to care that much about 'correct' use of standard-names in data loaded into Iris, since Iris doesn't interpret them for almost any purposes. But we do try to employ best practice when we write output files.
Well, iris as such might not use the standard name much, but as users of iris api we might be interested in this and appreciate not having to deal with alias (that might evolve over time) or get an warning that if one is aiiased. But I admit that this is somewhat speculative/forward-looking as I do no have an immediate hands-on use case.
Just by complete chance, when looking for something else, I came across this use case for handling aliases in a more sophisicated way than what is now the case.
@SciTools/peloton considering what is possible, according to the statement here in the CF FAQ
We may need to be pragmatic, since the suggestion is that older aliases can be ambiguous, in which case an automatic translation is simply not always possible. But a quick inspection of the table does seem to indicate that it might be practicable, in most/many cases.
Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade. ( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )
Another thought : in keeping with recent decisions, we might prefer to control loading via a content manager than add keywords (but it's only a style thing).
In view of that, and above comments, can you come up with an implementation proposal for enhanced load and save behaviours (or utility) @larsbarring ?
In fact I have been playing with some ideas during the last couple of days (as always very much limited by pretty basic [pun intended] coding skills):
- Reorganising the iris.std_names.py a bit to have a separate dict for aliases (done via an updated tools/generate_std_names.py). It now includes some table version information, and a separate dict for the standard name descriptions (optional when generated).
- Adding a new std_name_table.py containing the following functions:
get_convention-- return a tentative Conventions stringset_alias_processing-- define how to handle aliases ("keep" - current behaviour, "warn" - warn and update, "replace" - silently update) get_description-- return the standard name descriptioncheck_valid_standard_name-- check if a name is a standard name or an alias, and do the translation if requested
std_name_table is naively imported in iris.__init__ and std_name_table.check_valid_std_name is called in common.mixin.py. From a design point of view I think that this as far as I can reach. If you think these ideas, which are rather un-pythonic and un-anything, are worth considering I would need some support for taking it further. E.g. context managers are beyond my level...
And when again reading your comment @pp-mo, I think that you hit the nail when writing
Also we have concerns that automatic standard-name translation on load is a bit "dangerous" in breaking user code, since the same code loading the same data would produce cubes with different names on an Iris / standard-name-table upgrade. ( N.B. technically you can upgrade your installation to the latest std-names at any time, though we suspect it's rarely done ! )
I totally agree and this was my motivation for asking for standard name version information (#5255). And regarding upgrading the standard name table, I was asking myself whether this could be done more dynamically. I.e. if there was an iris.util.get_new CF_standard_name_table that would basically do what now is done during setup?
If we had the dictionary of aliases, it would be pretty easy to provide a callback function that renames the cubes. If the dictionary was public then users could use it to create their own callback functions, which would give maximum flexibility.
OK, I bite the bullet and have just made a POC PR (#5313). The dictionary @rcomer is asking for is available as iris.std_names.ALIASES