kerchunk
kerchunk copied to clipboard
NOAA NCEP Grib2 GFS & HRRR: levels, steps and duplicate variables!
It seems NCEP add several custom encodings to the WMO Grib2 standard that the CFGRIB library can't decode.
In the HRRR SubHourly product the Step is encoded as a range which breaks the CFGrib reader. When reading these variables with scan_grib you will get messages like:
2024-01-09T16:59:33.060Z MainProcess MainThread WARNING:grib2-to-zarr:Ignoring coordinate 'step' for varname 'vbdsf', raises: eccodes.WrongStepUnitError(Wrong units for step (step must be integer))
for variables: dswrf
, vbdsf
, tp
, sdwe
and unknown
. Some of the grib messages really fail to decode - resulting in the unknowns
. The step can be inferred from the runtime and the validtime of the model, but I think NCEP was trying to encode the duration of the average for the variables with stepType avg
.
By comparing the results of using scan_grib and parsing the idx files provided by ncep, I was able to identify a few more edge cases. The table below shows some of the variables from gs://global-forecast-system/gfs.20231001/00/atmos/gfs.t00z.pgrb2.0p25.f006 which have duplicate variable name
, step type
, level type
and level
. Currently, the grib_tree method assumes these will be unique and silently takes the data from the last message in the file.
There are two types of duplicates I have found so far:
- The GFS grib2 files include two accumulations for Convective Precipitation and Total Precipitation. One is the accumulation during the current model step and one is the total accumulation during the forecast run so far. With the step value parsed by CFGrib this is ambiguous for all model horizons (0 to 240 hour forecast files). With the idx file (gs://global-forecast-system/gfs.20231001/00/atmos/gfs.t00z.pgrb2.0p25.f006.idx) we can see a bit more metadata
ACPCP:surface:0-6 hour acc fcst
but for the first few timesteps of the model, even the idx values appear to be duplicates because the total is equal to the step accumulation. - There are several other variables that have level range such as
180-0 mb above ground
and0.44-1 sigma layer
which decode as NaN with CFGrib (via kerchunk scan_grib). These result in additional duplicates which can confuse grib_tree (and anybody using it).
varname | typeOfLevel | stepType | level | offset_idx | date | attrs | length_idx | idx_uri | grib_uri | idx_indexed_at | grib_crc32 | grib_updated_at | idx_crc32 | idx_updated_at | name | step | time | valid_time | uri | offset_grib | length_grib | inline_value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
acpcp | surface | accum | 0.0 | 426078582 | d=2023100100 | ACPCP:surface:0-6 hour acc fcst:\n | 279631 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective precipitation (water) | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 426078582 | 279631 | None |
acpcp | surface | accum | 0.0 | 426358213 | d=2023100100 | ACPCP:surface:0-6 hour acc fcst:\n | 279631 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective precipitation (water) | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 426358213 | 279631 | None |
cape | pressureFromGroundLayer | instant | NaN | 515902483 | d=2023100100 | CAPE:180-0 mb above ground:6 hour fcst:\n | 530643 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective available potential energy | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 515902483 | 530643 | None |
cape | pressureFromGroundLayer | instant | NaN | 526644614 | d=2023100100 | CAPE:90-0 mb above ground:6 hour fcst:\n | 479705 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective available potential energy | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 526644614 | 479705 | None |
cape | pressureFromGroundLayer | instant | NaN | 527482311 | d=2023100100 | CAPE:255-0 mb above ground:6 hour fcst:\n | 514093 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective available potential energy | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 527482311 | 514093 | None |
cin | pressureFromGroundLayer | instant | NaN | 516433126 | d=2023100100 | CIN:180-0 mb above ground:6 hour fcst:\n | 343271 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective inhibition | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 516433126 | 343271 | None |
cin | pressureFromGroundLayer | instant | NaN | 527124319 | d=2023100100 | CIN:90-0 mb above ground:6 hour fcst:\n | 357992 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective inhibition | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 527124319 | 357992 | None |
cin | pressureFromGroundLayer | instant | NaN | 527996404 | d=2023100100 | CIN:255-0 mb above ground:6 hour fcst:\n | 306931 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Convective inhibition | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 527996404 | 306931 | None |
r | sigmaLayer | instant | NaN | 518249965 | d=2023100100 | RH:0.33-1 sigma layer:6 hour fcst:\n | 727263 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Relative humidity | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06:00:00 | gs://global-forecast-system/gfs.20231001/00/at... | 518249965 | 727263 | None |
r | sigmaLayer | instant | NaN | 518977228 | d=2023100100 | RH:0.44-1 sigma layer:6 hour fcst:\n | 714324 | gs://global-forecast-system/gfs.20231001/00/at... | gs://global-forecast-system/gfs.20231001/00/at... | 2024-01-11 01:38:57.368924 | iT+Wyg== | 2023-10-01 03:34:14.440 | fmnXTA== | 2023-10-01 03:33:41.914 | Relative humidity | 0 days 06:00:00 | 2023-10-01 | 2023-10-01 06 |
Fixing the actual decoding of the variables is hard. It may be possible by adding custom ecCodes definitions.
In the mean time, I want this issue to exist in the world for anyone also wondering what is going on.
Suggestions on improving the behavior of grib_tree in the mean time would be welcome. At present it is silently picking the last grib message and using the data (offset and length) for the given variable. This might be more than a little surprising for some users.
NCEP team would like to expose their grib tables in a machine readable form! See https://github.com/NOAA-EMC/NCEPLIBS-grib_util/issues/293#issuecomment-2015611772 This would provide the data needed to generate the custom ecCodes definitions.