Dimension mismatch in MARS data
What happened?
xarray failed to open a GRIB file with xarray, erroring with a dimension mismatch
What are the steps to reproduce the bug?
import xarray as xr
xr.open_dataset("_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib", engine="cfgrib")
Version
0.9.14.0
Platform (OS and architecture)
MacOS, also occurs on Pyodide
Relevant log output
ecCodes provides no latitudes/longitudes for gridType='sh'
skipping variable: paramId==133 shortName='q'
Traceback (most recent call last):
File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 723, in build_dataset_components
dict_merge(dimensions, dims)
File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 639, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='values' value=1639680 new_value=6599680
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "venv/lib/python3.10/site-packages/xarray/backends/api.py", line 588, in open_dataset
backend_ds = backend.open_dataset(
File "venv/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 141, in open_dataset
ds = xr.Dataset(vars, attrs=attrs)
File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 713, in __init__
variables, coord_names, dims, indexes, _ = merge_data_and_coords(
File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 427, in merge_data_and_coords
return merge_core(
File "venv/lib/python3.10/site-packages/xarray/core/merge.py", line 705, in merge_core
dims = calculate_dimensions(variables)
File "venv/lib/python3.10/site-packages/xarray/core/variable.py", line 3009, in calculate_dimensions
raise ValueError(
ValueError: conflicting sizes for dimension 'values': length 6599680 on 'latitude' and length 1639680 on {'step': 'step', 'hybrid': 'hybrid', 'values': 't'}
Accompanying data
https://faubox.rrze.uni-erlangen.de/dl/fiVj21QV6ihsyWC8UEZYTT/_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
Organisation
University of Helsinki
CC @SF-N
Hi @juntyr,
The reason for the problem is that there are two different variables here, whose geographical coordinates do not match (in fact q is on a reduced Gaussian grid, and t is a spectral field, not on a grid at all). Therefore they cannot form a nice hypercube.
% grib_ls ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
edition centre date dataType gridType stepRange typeOfLevel level shortName packingType
2 ecmf 20240811 cf sh 354 hybrid 1 t spectral_complex
2 ecmf 20240811 cf reduced_gg 354 hybrid 1 q grid_ccsds
2 ecmf 20240811 cf sh 354 hybrid 2 t spectral_complex
2 ecmf 20240811 cf reduced_gg 354 hybrid 2 q grid_ccsds
2 ecmf 20240811 cf sh 360 hybrid 1 t spectral_complex
2 ecmf 20240811 cf reduced_gg 360 hybrid 1 q grid_ccsds
2 ecmf 20240811 cf sh 360 hybrid 2 t spectral_complex
2 ecmf 20240811 cf reduced_gg 360 hybrid 2 q grid_ccsds
8 of 8 messages in ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
You can, however, use a bit of built-in functionality from cfgrib to split the data into two datasets - one for each variable:
import cfgrib
ds = cfgrib.open_datasets('_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib')
Alternatively, to get more control, you can use the backend kwargs to load just selected fields according to their properties, e.g.
fname = "_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib"
ds = xr.open_dataset(fname, engine="cfgrib", backend_kwargs={'filter_by_keys': {'gridType': 'reduced_gg'}})
I hope this helps!