iris-grib
iris-grib copied to clipboard
Breaking change introduced between v0.17 and v0.21
We have recieved files which used to load properly in an environment with v0.17 and fail to load in v0.21. The error raised is:
"Product definition section 4 has missing scaled value of second fixed surface"
For another file, loading succeeds but happens differently.
Potential causes for this could be #318 or perhaps an update to eccodes.
I have raised this for discussion between team leadership, as this change in behaviour is deliberate, and more in line with Iris philosophy, but at the same time is clearly counter-productive to our user base.
This was discussed with @vsherratt at an in-house "Surgery" day, 2025-06-03, who also provided an example file for debugging + testing.
It looks like the file loads differently via eccodes (and later iris-grib) than it did in grib-api (and earlier iris-grib).
We suspect that the earlier grib-api returned 255/65535 on fetching missing/"unset" words, but the newer eccodes is returning "None"s in these cases, and iris-grib load code has been adjusted to suit. The release notes / comments aren't entirely clear on this, it needs checking out.
From some debugging with old+new environments and a suitable test file, a few observations... The problem messages in the test file have the following key properties :
typeOfFirstFixedSurface = 106 [Depth below land surface (m) (grib2/tables/5/4.5.table) ]
scaleFactorOfFirstFixedSurface = 0
scaledValueOfFirstFixedSurface = 100
typeOfSecondFixedSurface = 106 [Depth below land surface (m) (grib2/tables/5/4.5.table) ]
scaleFactorOfSecondFixedSurface = 0
scaledValueOfSecondFixedSurface = MISSING
This may seem a bit odd, but there are a series of different messages with levels of 0-7, 7-28, 28-100 and 100-(missing), so I think we can see what is intended to be conveyed here (though equally clearly, this is won't fit well to CF encoding).
Regarding the breaking change, it seems the key point here is that the basic logic of vertical level handling has not changed significantly between v0.17 and v0.21, but what happened is that we added support for arbitrary unsupported fixed surface types, whereas previously we refused to handle unrecognised types. Unfortunately in this case, that leads to these messages causing an error (failed to get a second surface value), whereas previously they only raised a warning "surface type unrecognised" -- though in the old world, the vertical levels were then not processed into a coordinate.
So, it's a bit of a moot point whether the encoding is this file is "correct" or not. From the wider context, it does look like it is intentional, and intended to convey something definite. The error case isn't really a new one : it would have occurred in the past, if the level type was set to a recognised one.
The problem messages in the test file have the following key properties ... this is won't fit well to CF encoding
On further inspection, it turns out that the test file has groups of messages which despite having different parameter ids "go together", and have different vertical parameter ranges, encoded as fixed surfaces. One of these has a missing upper boundary which appears to be used meaning something like "and upwards". E.G. Like this :
msg param depths
413 39 0 - 7
347 40 7 - 28
295 41 28 - 100
698 42 100 - xx
Unless it becomes clear that this kind of encoding is common, we're unlike to make generic code changes to accommodate it, so (sticking my neck out) I'd expect a user to employ the ("messages-from-filename"; modify-messages; "load-pairs-from-fields") route to create a custom load handler for this. In this case, I've suggested something like this ...
def adjust_message(msg: GribMessage) -> GribMessage:
# Identify any messages with specific properties, and patch them
if msg.sections[0]["editionNumber"] == 2:
s4 = msg.sections[4]
if (
s4["productDefinitionTemplateNumber"] == 0
and s4["parameterCategory"] == 128
and s4["parameterNumber"] in [42, 236]
and s4["typeOfFirstFixedSurface"] == 106
and s4["typeOfSecondFixedSurface"] == 106
and s4["scaleFactorOfFirstFixedSurface"] == 0
and s4["scaledValueOfFirstFixedSurface"] == 100
# specific messages leveltype=106 have first-level=100 and second-level=None
and s4["scaledValueOfSecondFixedSurface"] in [None, 65535]
):
# N.B. **IMPORTANT** a bug you must workaround :
# - in order to overwrite a key you must have read it first
_ = s4["scaleFactorOfSecondFixedSurface"]
s4["scaleFactorOfSecondFixedSurface"] = 0
s4["scaledValueOfSecondFixedSurface"] = 289 # a magic number !
return msg
# Get (iterator over) fields from file
fields = GribMessage.messages_from_filename(filename)
# Pass through the correction routine
fields = (adjust_message(msg) for msg in fields)
# Convert to (raw) cubes
cubes = CubeList(cube for cube, _ in load_pairs_from_fields(fields))
# Combine to mimic a normal Iris load.
cubes = cubes.combine()
However, this does raise quite a few really worthwhile general points:
- the documentation section on this has a lot of scope for improvement
- it is poorly sited, and poorly named ("working with grib messages")
- it doesn't show how to modify messages, only how to select a subset
- the example doesn't show how to replace a 'load' operation
- follow-on point : modifying message keys has (what I think is) a bug which currently needs to be understood + worked around
- follow-on point : docs are unreasonably hard to navigate (partly due to the lack of a TOC)
Need to check whether #647 has affected the problems discussed here (maybe fixed?! 🤞), since it was also working on the loading of fixed surfaces.