xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Race condition in `align`?

Open crusaderky opened this issue 3 months ago • 2 comments

What happened?

While stress-testing a personal xarray-based project with pytest-run-parallel, which runs my test suite many times in parallel over different threads, I'm frequently getting a race condition: https://github.com/crusaderky/pathfinder2e_stats/actions/runs/19407113819/job/55523433199

pathfinder2e_stats/damage.py:381: in damage
    _, persistent_damage_DC = xarray.align(
.pixi/envs/nogil/lib/python3.14t/site-packages/xarray/structure/alignment.py:967: in align
    aligner.align()
.pixi/envs/nogil/lib/python3.14t/site-packages/xarray/structure/alignment.py:667: in align
    self.reindex_all()
.pixi/envs/nogil/lib/python3.14t/site-packages/xarray/structure/alignment.py:638: in reindex_all
    self.results = tuple(
.pixi/envs/nogil/lib/python3.14t/site-packages/xarray/structure/alignment.py:625: in _reindex_one
    dim_pos_indexers = self._get_dim_pos_indexers(matching_indexes)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.pixi/envs/nogil/lib/python3.14t/site-packages/xarray/structure/alignment.py:556: in _get_dim_pos_indexers
    indexers = obj_idx.reindex_like(aligned_idx, **self.reindex_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PandasIndex(Index(['bleed', 'fire', 'electricity'], dtype='str', name='damage_type'))
other = PandasIndex(Index(['bleed', 'cold', 'fire'], dtype='str', name='damage_type'))
method = None, tolerance = None

    def reindex_like(
        self, other: Self, method=None, tolerance=None
    ) -> dict[Hashable, Any]:
        if not self.index.is_unique:
>           raise ValueError(
                f"cannot reindex or align along dimension {self.dim!r} because the "
                "(pandas) index has duplicate values"
            )
E           ValueError: cannot reindex or align along dimension 'damage_type' because the (pandas) index has duplicate values

Sadly, I cannot reproduce the failure locally - for some reason it happens only in CI, even if the whole stack and platform are identical between CI and my local box. For this reason, I'm unsure if the issue is in xarray or in pandas.

The test that is failing in my project is calling

_, b2 = xarray.align(a, b, align="left")

where a is a thread-local Dataset and b is a DataArray that is defined in the global scope and is shared among all the threads that run xarray.align in parallel. The two objects are aligned along a string index, with object dtype in a and dtype=<U11 in b. All inputs are deterministic.

Minimal reproducer (which I'm however failing to make it demonstrate the issue as explained above):

import xarray
from numpy.testing._private.utils import run_threaded

# b is shared among all threads
b = xarray.DataArray([4, 5, 6], dims=["x"], coords={"x": ["b", "f", "c"]})

def f():
    # a is thread-local
    a = xarray.Dataset(coords={"x": ["b", "f", "e"]})
    a.coords["x"] = a.coords["x"].astype(object)
    _, b2 = xarray.align(a, b, join="left", fill_value=-1)

run_threaded(f)

My gut feeling is that there is a brief moment where the input DataArray b is temporarily updated in place. However, I've audited the xarray code and did not spot anything untowards. From what I understand:

  1. align calls Aligner.reindex_all,
  2. which calls Aligner._reindex_one,
  3. which calls DataArray._reindex_callback on b,
  4. which calls DataArray._to_temp_dataset -> Dataset._reindex_callback -> DataArray.from_temp_dataset

Notably, the Variable instances in Dataset._reindex_callback are the same objects in the shared b object.

Environment

  • python-freethreading 3.14.0
  • pandas 3.0.0.dev0+2714.gfa5b90a079
  • xarray 2025.10.1
  • ubuntu-latest github actions CI runners

crusaderky avatar Nov 16 '25 15:11 crusaderky

I'm inclined to blame pandas (https://github.com/pandas-dev/pandas/issues/2728, https://github.com/pydata/xarray/issues/9836) and I suspect there's at least one shallow-copy somewhere in that align code path.

One way to check would be to write a dummy custom index class with no Pandas involved.

dcherian avatar Nov 17 '25 04:11 dcherian

My comments on that issue are coming from xarray usage. If I remember correctly the examples are the functions which xarray calls

alippai avatar Dec 03 '25 06:12 alippai