xarray
xarray copied to clipboard
xr.concat concatenates along dimensions that it wasn't asked to
What happened?
Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists).
import xarray as xr
ds1 = xr.Dataset(
coords={
'x_center': ('x_center', [1, 2, 3]),
'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
ds2 = xr.Dataset(
coords={
'x_center': ('x_center', [4, 5, 6]),
'x_outer': ('x_outer', [4.5, 5.5, 6.5]),
},
)
Calling xr.concat
on these with dim='x_center'
happily concatenates them
xr.concat([ds1, ds2], dim='x_center')
<xarray.Dataset>
Dimensions: (x_outer: 7, x_center: 6)
Coordinates:
* x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5
* x_center (x_center) int64 1 2 3 4 5 6
Data variables:
*empty*
but notice that the returned result has been concatenated along both x_center
and x_outer
.
What did you expect to happen?
I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. x_outer
).
What I expected to happen was that (as by default coords='different'
) both variables would be attempted to be concatenated along the x_center
dimension, which would have succeeded for the x_center
variable but failed for the x_outer
variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens:
import xarray as xr
ds1 = xr.Dataset(
data_vars={
'a': ('x_center', [1, 2, 3]),
'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
ds2 = xr.Dataset(
data_vars={
'a': ('x_center', [4, 5, 6]),
'b': ('x_outer', [4.5, 5.5, 6.5]),
},
)
xr.concat([ds1, ds2], dim='x_center', data_vars='different')
ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4}
Minimal Complete Verifiable Example
No response
MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
No response
Anything else we need to know?
I was trying to create an example for which you would need the automatic combined concat/merge that happens within xr.combine_by_coords
.
Environment
xarray 2023.8.0
A consequence of the alignment behavior described in #6806
The PR suggests using the proposed join='strict'
kwarg.
test_concat_join_coordinate_variables_non_asked_dims
tests:
ds1 = xr.Dataset(
coords={
'x_center': ('x_center', [1, 2, 3]),
'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
xr.concat([ds1, ds2], dim='x_center')
will still produce the same current surprising behavior, but using xr.concat([ds1, ds2], dim='x_center', join='strict')
would throw an error. The issue I see here, is maybe strict
would not really be a join mode, but a whole new parameter. It seems that we could want strict
dimension names checks whether the join type is inner
or outer
etc. For now strict
is really just an even more restrictive exact
, adding more checks at multiple places inside of the aligner.py
module.
test_concat_join_non_coordinate_variables
tests:
ds1 = xr.Dataset(
data_vars={
'a': ('x_center', [1, 2, 3]),
'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
This tests just enforce that the expected behavior happens.
Wouldn't join="exact"
raise an error here?
Indeed join='exact' raises an error:
import xarray as xr
ds1 = xr.Dataset(
coords={
'x_center': ('x_center', [1, 2, 3]),
'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
ds2 = xr.Dataset(
coords={
'x_center': ('x_center', [4, 5, 6]),
'x_outer': ('x_outer', [4.5, 5.5, 6.5]),
},
)
xr.concat([ds1, ds2], dim='x_center', join='exact')
ValueError: cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'x_outer' ('x_outer',)