cfgrib icon indicating copy to clipboard operation
cfgrib copied to clipboard

Dimension mismatch in MARS data

Open juntyr opened this issue 1 year ago • 2 comments

What happened?

xarray failed to open a GRIB file with xarray, erroring with a dimension mismatch

What are the steps to reproduce the bug?

import xarray as xr
xr.open_dataset("_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib", engine="cfgrib")

Version

0.9.14.0

Platform (OS and architecture)

MacOS, also occurs on Pyodide

Relevant log output

ecCodes provides no latitudes/longitudes for gridType='sh'
skipping variable: paramId==133 shortName='q'
Traceback (most recent call last):
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 723, in build_dataset_components
    dict_merge(dimensions, dims)
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 639, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='values' value=1639680 new_value=6599680
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "venv/lib/python3.10/site-packages/xarray/backends/api.py", line 588, in open_dataset
    backend_ds = backend.open_dataset(
  File "venv/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 141, in open_dataset
    ds = xr.Dataset(vars, attrs=attrs)
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 713, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 427, in merge_data_and_coords
    return merge_core(
  File "venv/lib/python3.10/site-packages/xarray/core/merge.py", line 705, in merge_core
    dims = calculate_dimensions(variables)
  File "venv/lib/python3.10/site-packages/xarray/core/variable.py", line 3009, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'values': length 6599680 on 'latitude' and length 1639680 on {'step': 'step', 'hybrid': 'hybrid', 'values': 't'}

Accompanying data

https://faubox.rrze.uni-erlangen.de/dl/fiVj21QV6ihsyWC8UEZYTT/_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

Organisation

University of Helsinki

juntyr avatar Aug 15 '24 11:08 juntyr

CC @SF-N

juntyr avatar Aug 15 '24 11:08 juntyr

Hi @juntyr,

The reason for the problem is that there are two different variables here, whose geographical coordinates do not match (in fact q is on a reduced Gaussian grid, and t is a spectral field, not on a grid at all). Therefore they cannot form a nice hypercube.

% grib_ls ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType
2            ecmf         20240811     cf           sh           354          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           354          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       2            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       2            q            grid_ccsds
8 of 8 messages in ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

You can, however, use a bit of built-in functionality from cfgrib to split the data into two datasets - one for each variable:

import cfgrib
ds = cfgrib.open_datasets('_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib')

Alternatively, to get more control, you can use the backend kwargs to load just selected fields according to their properties, e.g.

fname = "_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib"
ds = xr.open_dataset(fname, engine="cfgrib", backend_kwargs={'filter_by_keys': {'gridType': 'reduced_gg'}})

I hope this helps!

iainrussell avatar Aug 27 '24 07:08 iainrussell