xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

[Explore]: Investigate Xarray's new behavior for `compat` and `join`

Open tomvothecoder opened this issue 3 months ago • 0 comments

Is your feature request related to a problem?

Related to #798.

Full Warning

FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name.

To opt in to the new defaults and get rid of this warning now, use: xr.set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly when calling merge/combine functions.

Why xarray changed the default

Related to pydata/xarray#10062

* Faster: avoids costly equality checks on overlapping chunks (a big win for Dask + large datasets).

* Simpler: most workflows with intentional overlaps (e.g., boundary timesteps) only need “last write wins.”

Trade-offs for xCDAT

* Preserving legacy behavior (no_conflicts): safer, ensures merged datasets don’t silently mask data mismatches, but can be slower.

* Following new default (override): faster and aligns with xarray’s direction, but risks hiding subtle data inconsistencies.
  
  * Xarray updated the defaults primarily to improve performance and usability reason, especially with large Dask-backed arrays. ([New defaults for `concat`, `merge`, `combine_*` pydata/xarray#10062](https://github.com/pydata/xarray/pull/10062)).
  * However, these changes may introduce **accuracy risks** when combining datasets with overlapping values.

Describe the solution you'd like

We should investigate how this affects I/O with xCDAT. There may be some performance gains by opting into the new combine behaviors, especially with large Dask-backed datasets.

Describe alternatives you've considered

No response

Additional context

No response

tomvothecoder avatar Oct 01 '25 18:10 tomvothecoder