iris-grib icon indicating copy to clipboard operation
iris-grib copied to clipboard

Breaking change introduced between v0.17 and v0.21

Open stephenworsley opened this issue 6 months ago • 4 comments

We have recieved files which used to load properly in an environment with v0.17 and fail to load in v0.21. The error raised is: "Product definition section 4 has missing scaled value of second fixed surface" For another file, loading succeeds but happens differently.

Potential causes for this could be #318 or perhaps an update to eccodes.

stephenworsley avatar Jun 03 '25 15:06 stephenworsley

I have raised this for discussion between team leadership, as this change in behaviour is deliberate, and more in line with Iris philosophy, but at the same time is clearly counter-productive to our user base.

trexfeathers avatar Jun 05 '25 08:06 trexfeathers

This was discussed with @vsherratt at an in-house "Surgery" day, 2025-06-03, who also provided an example file for debugging + testing.

It looks like the file loads differently via eccodes (and later iris-grib) than it did in grib-api (and earlier iris-grib).

We suspect that the earlier grib-api returned 255/65535 on fetching missing/"unset" words, but the newer eccodes is returning "None"s in these cases, and iris-grib load code has been adjusted to suit. The release notes / comments aren't entirely clear on this, it needs checking out.

pp-mo avatar Jun 10 '25 10:06 pp-mo

From some debugging with old+new environments and a suitable test file, a few observations... The problem messages in the test file have the following key properties :

typeOfFirstFixedSurface = 106 [Depth below land surface  (m)  (grib2/tables/5/4.5.table) ]
scaleFactorOfFirstFixedSurface = 0
scaledValueOfFirstFixedSurface = 100
typeOfSecondFixedSurface = 106 [Depth below land surface  (m)  (grib2/tables/5/4.5.table) ]
scaleFactorOfSecondFixedSurface = 0
scaledValueOfSecondFixedSurface = MISSING

This may seem a bit odd, but there are a series of different messages with levels of 0-7, 7-28, 28-100 and 100-(missing), so I think we can see what is intended to be conveyed here (though equally clearly, this is won't fit well to CF encoding).

Regarding the breaking change, it seems the key point here is that the basic logic of vertical level handling has not changed significantly between v0.17 and v0.21, but what happened is that we added support for arbitrary unsupported fixed surface types, whereas previously we refused to handle unrecognised types. Unfortunately in this case, that leads to these messages causing an error (failed to get a second surface value), whereas previously they only raised a warning "surface type unrecognised" -- though in the old world, the vertical levels were then not processed into a coordinate.

So, it's a bit of a moot point whether the encoding is this file is "correct" or not. From the wider context, it does look like it is intentional, and intended to convey something definite. The error case isn't really a new one : it would have occurred in the past, if the level type was set to a recognised one.

pp-mo avatar Jun 11 '25 14:06 pp-mo

The problem messages in the test file have the following key properties ... this is won't fit well to CF encoding

On further inspection, it turns out that the test file has groups of messages which despite having different parameter ids "go together", and have different vertical parameter ranges, encoded as fixed surfaces. One of these has a missing upper boundary which appears to be used meaning something like "and upwards". E.G. Like this :

    msg     param   depths
    413     39      0       -   7
    347     40      7       -   28
    295     41      28      -   100
    698     42      100     -   xx

Unless it becomes clear that this kind of encoding is common, we're unlike to make generic code changes to accommodate it, so (sticking my neck out) I'd expect a user to employ the ("messages-from-filename"; modify-messages; "load-pairs-from-fields") route to create a custom load handler for this. In this case, I've suggested something like this ...

def adjust_message(msg: GribMessage) -> GribMessage:
    # Identify any messages with specific properties, and patch them
    if msg.sections[0]["editionNumber"] == 2:
        s4 = msg.sections[4]
        if (
                s4["productDefinitionTemplateNumber"] == 0
            and s4["parameterCategory"] == 128
            and s4["parameterNumber"] in [42, 236]
            and s4["typeOfFirstFixedSurface"] == 106
            and s4["typeOfSecondFixedSurface"] == 106
            and s4["scaleFactorOfFirstFixedSurface"] == 0
            and s4["scaledValueOfFirstFixedSurface"] == 100
            # specific messages leveltype=106 have first-level=100 and second-level=None
            and s4["scaledValueOfSecondFixedSurface"] in [None, 65535]
        ):
            # N.B. **IMPORTANT** a bug you must workaround :
            #  - in order to overwrite a key you must have read it first
            _ = s4["scaleFactorOfSecondFixedSurface"]
            s4["scaleFactorOfSecondFixedSurface"] = 0
            s4["scaledValueOfSecondFixedSurface"] = 289  # a magic number !

    return msg

# Get (iterator over) fields from file
fields = GribMessage.messages_from_filename(filename)
# Pass through the correction routine
fields = (adjust_message(msg) for msg in fields)
# Convert to (raw) cubes
cubes = CubeList(cube for cube, _ in load_pairs_from_fields(fields))
# Combine to mimic a normal Iris load.
cubes = cubes.combine()

However, this does raise quite a few really worthwhile general points:

  1. the documentation section on this has a lot of scope for improvement
    • it is poorly sited, and poorly named ("working with grib messages")
    • it doesn't show how to modify messages, only how to select a subset
    • the example doesn't show how to replace a 'load' operation
  2. follow-on point : modifying message keys has (what I think is) a bug which currently needs to be understood + worked around
  3. follow-on point : docs are unreasonably hard to navigate (partly due to the lack of a TOC)

pp-mo avatar Jun 12 '25 15:06 pp-mo

Need to check whether #647 has affected the problems discussed here (maybe fixed?! 🤞), since it was also working on the loading of fixed surfaces.

trexfeathers avatar Nov 13 '25 10:11 trexfeathers