xarray icon indicating copy to clipboard operation
xarray copied to clipboard

slicing DataArray with RangeIndex coordinate can put coordinate in inconsistent state

Open anntzer opened this issue 5 months ago • 0 comments

What happened?

Slicing a DataArray with RangeIndex coordinate can put that coordinate in an internally inconsistent state.

What did you expect to happen?

No internally inconsistent state.

Minimal Complete Verifiable Example

import numpy as np, xarray as xr, xarray.indexes

n = 30
step = 1
da = xr.DataArray(np.zeros(n), dims=["x"])
da = da.assign_coords(
    xr.Coordinates.from_xindex(
        xr.indexes.RangeIndex.linspace(0, (n - 1) * step, n, dim="x")))
sub = da.isel(x=slice(4, None, 3))

print(da)
print(da.shape, da.x.shape)  # both have shape (30,)
da.expand_dims({"y": [0]}, 0)  # ok

print()

print(sub)
print(sub.shape, sub.x.shape)  # sub has shape (9,) but sub.x has shape (8,)
sub.expand_dims({"y": [0]}, 0)  # crashes, likely due to internally inconsistent state

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

<xarray.DataArray (x: 30)> Size: 240B
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * x        (x) float64 240B 0.0 1.0 2.0 3.0 4.0 ... 25.0 26.0 27.0 28.0 29.0
Indexes:
    x        RangeIndex (start=0, stop=30, step=1)
(30,) (30,)

<xarray.DataArray (x: 9)> Size: 72B
array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * x        (x) float64 64B 4.0 7.0 10.0 13.0 16.0 19.0 22.0 25.0
Indexes:
    x        RangeIndex (start=4, stop=28, step=3)
(9,) (8,)
Traceback (most recent call last):
  File "/private/tmp/test.py", line 19, in <module>
    sub.expand_dims({"y": [0]}, 0)  # crashes
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 2707, in expand_dims
    ds = self._to_temp_dataset().expand_dims(
         ~~~~~~~~~~~~~~~~~~~~~^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 581, in _to_temp_dataset
    return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 648, in _to_dataset_whole
    return Dataset._construct_direct(variables, coord_names, indexes=indexes)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataset.py", line 776, in _construct_direct
    dims = calculate_dimensions(variables)
  File "/path/to/python3.13/site-packages/xarray/core/variable.py", line 3044, in calculate_dimensions
    raise ValueError(
    ...<2 lines>...
    )
ValueError: conflicting sizes for dimension 'x': length 9 on <this-array> and length 8 on {'x': 'x'}

Anything else we need to know?

From a quick look, the bug occurs at https://github.com/pydata/xarray/blob/99a1ad2144cfeb44b5fff1aa1cebd6dfcf962672/xarray/indexes/range_index.py#L87 where the formula is wrong. The correct formula can be e.g. copied from https://github.com/python/cpython/blob/569fc6870f048cb75469ae3cacb6ebcf5172a10e/Objects/rangeobject.c#L950-L976 the patch for xarray being

                      ^^^^^^^^^^^^^
diff --git i/xarray/indexes/range_index.py w/xarray/indexes/range_index.py
index 2b9a5e50..04459e79 100644
--- i/xarray/indexes/range_index.py
+++ w/xarray/indexes/range_index.py
@@ -84,7 +84,7 @@ class RangeCoordinateTransform(CoordinateTransform):
         # TODO: support reverse transform (i.e., start > stop)?
         assert sl.start < sl.stop

-        new_size = (sl.stop - sl.start) // sl.step
+        new_size = (sl.stop - sl.start - 1) // sl.step + 1
         new_start = self.start + sl.start * self.step
         new_stop = new_start + new_size * sl.step * self.step

which appears to fix the problem for me.

Environment

INSTALLED VERSIONS

commit: None python: 3.13.0 | packaged by conda-forge | (main, Oct 17 2024, 12:32:35) [Clang 17.0.6 ] python-bits: 64 OS: Darwin OS-release: 24.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.6 libnetcdf: None

xarray: 2025.6.1 pandas: 2.3.0 numpy: 2.3.1 scipy: 1.16.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.13.0 zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.10.0 distributed: None matplotlib: 3.11.0.dev985+gbebb26384f cartopy: None seaborn: 0.13.2 numbagg: None fsspec: 2024.9.0 cupy: None pint: 0.24.4 sparse: None flox: None numpy_groupies: None setuptools: 69.2.0 pip: 24.3.1 conda: None pytest: 8.3.5 mypy: 1.15.0 IPython: 9.3.0 sphinx: 8.2.3

anntzer avatar Jun 23 '25 18:06 anntzer