xarray icon indicating copy to clipboard operation
xarray copied to clipboard

xr.concat concatenates along dimensions that it wasn't asked to

Open TomNicholas opened this issue 1 year ago • 4 comments

What happened?

Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists).

import xarray as xr

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    coords={
        'x_center': ('x_center', [4, 5, 6]),
        'x_outer':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)

Calling xr.concat on these with dim='x_center' happily concatenates them

xr.concat([ds1, ds2], dim='x_center')
<xarray.Dataset>
Dimensions:   (x_outer: 7, x_center: 6)
Coordinates:
  * x_outer   (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5
  * x_center  (x_center) int64 1 2 3 4 5 6
Data variables:
    *empty*

but notice that the returned result has been concatenated along both x_center and x_outer.

What did you expect to happen?

I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. x_outer).

What I expected to happen was that (as by default coords='different') both variables would be attempted to be concatenated along the x_center dimension, which would have succeeded for the x_center variable but failed for the x_outer variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens:

import xarray as xr

ds1 = xr.Dataset(
    data_vars={
        'a': ('x_center', [1, 2, 3]),
        'b':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    data_vars={
        'a': ('x_center', [4, 5, 6]),
        'b':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)
xr.concat([ds1, ds2], dim='x_center', data_vars='different') 
ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4}

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I was trying to create an example for which you would need the automatic combined concat/merge that happens within xr.combine_by_coords.

Environment

xarray 2023.8.0

TomNicholas avatar Sep 25 '23 18:09 TomNicholas

A consequence of the alignment behavior described in #6806

TomNicholas avatar Sep 27 '23 16:09 TomNicholas

The PR suggests using the proposed join='strict' kwarg.

test_concat_join_coordinate_variables_non_asked_dims tests:

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

xr.concat([ds1, ds2], dim='x_center') will still produce the same current surprising behavior, but using xr.concat([ds1, ds2], dim='x_center', join='strict') would throw an error. The issue I see here, is maybe strict would not really be a join mode, but a whole new parameter. It seems that we could want strict dimension names checks whether the join type is inner or outer etc. For now strict is really just an even more restrictive exact, adding more checks at multiple places inside of the aligner.py module.

test_concat_join_non_coordinate_variables tests:

ds1 = xr.Dataset(
    data_vars={
        'a': ('x_center', [1, 2, 3]),
        'b':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

This tests just enforce that the expected behavior happens.

etienneschalk avatar Feb 04 '24 20:02 etienneschalk

Wouldn't join="exact" raise an error here?

dcherian avatar Feb 14 '24 17:02 dcherian

Indeed join='exact' raises an error:

import xarray as xr

ds1 = xr.Dataset(
    coords={
        'x_center': ('x_center', [1, 2, 3]),
        'x_outer':  ('x_outer',  [0.5, 1.5, 2.5, 3.5]),  
    },
)

ds2 = xr.Dataset(
    coords={
        'x_center': ('x_center', [4, 5, 6]),
        'x_outer':  ('x_outer',  [4.5, 5.5, 6.5]),  
    },
)
xr.concat([ds1, ds2], dim='x_center', join='exact')
ValueError: cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'x_outer' ('x_outer',)

etienneschalk avatar Feb 14 '24 20:02 etienneschalk