xarray
xarray copied to clipboard
Unclear error message when combine_by_coords doesn't find an index
What is your issue?
The error you get from inside xr.combine_by_coords
when a 1D dimension coordinate is not backed by an index uses outdated verbiage. That's because it predates the indexes refactor, and this fail case wasn't anticipated at the time of writing.
The reproducer below uses the VirtualiZarr
package, but only as a shortcut to generate a dataset that has 1D coordinates not backed by indexes. You could construct a pure-xarray reproducer.
In [1]: from virtualizarr import open_virtual_dataset
In [2]: import xarray as xr
In [3]: ds1 = open_virtual_dataset('air1.nc', indexes={})
In [4]: ds2 = open_virtual_dataset('air2.nc', indexes={})
In [5]: xr.combine_by_coords([ds1, ds2], coords='minimal', compat='override')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 xr.combine_by_coords([ds1, ds2], coords='minimal', compat='override')
File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:958, in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs)
954 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
956 # Perform the multidimensional combine on each group of data variables
957 # before merging back together
--> 958 concatenated_grouped_by_data_vars = tuple(
959 _combine_single_variable_hypercube(
960 tuple(datasets_with_same_vars),
961 fill_value=fill_value,
962 data_vars=data_vars,
963 coords=coords,
964 compat=compat,
965 join=join,
966 combine_attrs=combine_attrs,
967 )
968 for vars, datasets_with_same_vars in grouped_by_vars
969 )
971 return merge(
972 concatenated_grouped_by_data_vars,
973 compat=compat,
(...)
976 combine_attrs=combine_attrs,
977 )
File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:959, in <genexpr>(.0)
954 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
956 # Perform the multidimensional combine on each group of data variables
957 # before merging back together
958 concatenated_grouped_by_data_vars = tuple(
--> 959 _combine_single_variable_hypercube(
960 tuple(datasets_with_same_vars),
961 fill_value=fill_value,
962 data_vars=data_vars,
963 coords=coords,
964 compat=compat,
965 join=join,
966 combine_attrs=combine_attrs,
967 )
968 for vars, datasets_with_same_vars in grouped_by_vars
969 )
971 return merge(
972 concatenated_grouped_by_data_vars,
973 compat=compat,
(...)
976 combine_attrs=combine_attrs,
977 )
File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:619, in _combine_single_variable_hypercube(datasets, fill_value, data_vars, coords, compat, join, combine_attrs)
613 if len(datasets) == 0:
614 raise ValueError(
615 "At least one Dataset is required to resolve variable names "
616 "for combined hypercube."
617 )
--> 619 combined_ids, concat_dims = _infer_concat_order_from_coords(list(datasets))
621 if fill_value is None:
622 # check that datasets form complete hypercube
623 _check_shape_tile_ids(combined_ids)
File ~/miniconda3/envs/numpy2.0_released/lib/python3.11/site-packages/xarray/core/combine.py:92, in _infer_concat_order_from_coords(datasets)
90 indexes = [ds._indexes.get(dim) for ds in datasets]
91 if any(index is None for index in indexes):
---> 92 raise ValueError(
93 "Every dimension needs a coordinate for "
94 "inferring concatenation order"
95 )
97 # TODO (benbovy, flexible indexes): support flexible indexes?
98 indexes = [index.to_pandas_index() for index in indexes]
ValueError: Every dimension needs a coordinate for inferring concatenation order
In this reproducer the dimension time
has a coordinate, it just doesn't have an index backing that coordinate. The error message also doesn't say which dimension is the problem.
This error message should say something more like
"ValueError: Every dimension requires a corresponding 1D coordinate and index for inferring concatenation order but the coordinate 'time' has no corresponding index"
One could even argue that the name combine_by_coords
should really be combine_using_indexes
...