xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Unclear error message when combine_by_coords doesn't find an index

Open TomNicholas opened this issue 6 days ago • 4 comments

What is your issue?

The error you get from inside xr.combine_by_coords when a 1D dimension coordinate is not backed by an index uses outdated verbiage. That's because it predates the indexes refactor, and this fail case wasn't anticipated at the time of writing.

The reproducer below uses the VirtualiZarr package, but only as a shortcut to generate a dataset that has 1D coordinates not backed by indexes. You could construct a pure-xarray reproducer.

In [1]: from virtualizarr import open_virtual_dataset

In [2]: import xarray as xr

In [3]: ds1 = open_virtual_dataset('air1.nc', indexes={})

In [4]: ds2 = open_virtual_dataset('air2.nc', indexes={})

In [5]: xr.combine_by_coords([ds1, ds2], coords='minimal', compat='override')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 xr.combine_by_coords([ds1, ds2], coords='minimal', compat='override')

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:958, in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs)
    954     grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
    956     # Perform the multidimensional combine on each group of data variables
    957     # before merging back together
--> 958     concatenated_grouped_by_data_vars = tuple(
    959         _combine_single_variable_hypercube(
    960             tuple(datasets_with_same_vars),
    961             fill_value=fill_value,
    962             data_vars=data_vars,
    963             coords=coords,
    964             compat=compat,
    965             join=join,
    966             combine_attrs=combine_attrs,
    967         )
    968         for vars, datasets_with_same_vars in grouped_by_vars
    969     )
    971 return merge(
    972     concatenated_grouped_by_data_vars,
    973     compat=compat,
   (...)
    976     combine_attrs=combine_attrs,
    977 )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:959, in <genexpr>(.0)
    954     grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
    956     # Perform the multidimensional combine on each group of data variables
    957     # before merging back together
    958     concatenated_grouped_by_data_vars = tuple(
--> 959         _combine_single_variable_hypercube(
    960             tuple(datasets_with_same_vars),
    961             fill_value=fill_value,
    962             data_vars=data_vars,
    963             coords=coords,
    964             compat=compat,
    965             join=join,
    966             combine_attrs=combine_attrs,
    967         )
    968         for vars, datasets_with_same_vars in grouped_by_vars
    969     )
    971 return merge(
    972     concatenated_grouped_by_data_vars,
    973     compat=compat,
   (...)
    976     combine_attrs=combine_attrs,
    977 )

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:619, in _combine_single_variable_hypercube(datasets, fill_value, data_vars, coords, compat, join, combine_attrs)
    613 if len(datasets) == 0:
    614     raise ValueError(
    615         "At least one Dataset is required to resolve variable names "
    616         "for combined hypercube."
    617     )
--> 619 combined_ids, concat_dims = _infer_concat_order_from_coords(list(datasets))
    621 if fill_value is None:
    622     # check that datasets form complete hypercube
    623     _check_shape_tile_ids(combined_ids)

File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:92, in _infer_concat_order_from_coords(datasets)
     90 indexes = [ds._indexes.get(dim) for ds in datasets]
     91 if any(index is None for index in indexes):
---> 92     raise ValueError(
     93         "Every dimension needs a coordinate for "
     94         "inferring concatenation order"
     95     )
     97 # TODO (benbovy, flexible indexes): support flexible indexes?
     98 indexes = [index.to_pandas_index() for index in indexes]

ValueError: Every dimension needs a coordinate for inferring concatenation order

In this reproducer the dimension time has a coordinate, it just doesn't have an index backing that coordinate. The error message also doesn't say which dimension is the problem.

This error message should say something more like

"ValueError: Every dimension requires a corresponding 1D coordinate and index for inferring concatenation order but the coordinate 'time' has no corresponding index"

One could even argue that the name combine_by_coords should really be combine_using_indexes ...

TomNicholas avatar Jul 01 '24 16:07 TomNicholas