xarray
xarray copied to clipboard
Dataset.weighted along a dimension not on weights errors
What happened?
ds.weighted(weights).mean(dims)
errors when reducing over a dimension that is neither on the weights
nor on the variable.
What did you expect to happen?
This used to work and was "broken" by #8606. However, we may want to fix this by ignoring (?) those data vars instead (#7027).
Minimal Complete Verifiable Example
import xarray as xr
ds = xr.Dataset({"a": (("y", "x"), [[1, 2]]), "scalar": 1})
weights = xr.DataArray([1, 2], dims="x")
ds.weighted(weights).mean("y")
MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
ValueError Traceback (most recent call last)
Cell In[1], line 6
3 ds = xr.Dataset({"a": (("y", "x"), [[1, 2]]), "scalar": 1})
4 weights = xr.DataArray([1, 2], dims="x")
----> 6 ds.weighted(weights).mean("y")
File ~/code/xarray/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
111 kwargs.update({name: arg for name, arg in zip_args})
113 return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)
File ~/code/xarray/xarray/core/weighted.py:497, in Weighted.mean(self, dim, skipna, keep_attrs)
489 @_deprecate_positional_args("v2023.10.0")
490 def mean(
491 self,
(...)
495 keep_attrs: bool | None = None,
496 ) -> T_Xarray:
--> 497 return self._implementation(
498 self._weighted_mean, dim=dim, skipna=skipna, keep_attrs=keep_attrs
499 )
File ~/code/xarray/xarray/core/weighted.py:558, in DatasetWeighted._implementation(self, func, dim, **kwargs)
555 def _implementation(self, func, dim, **kwargs) -> Dataset:
556 self._check_dim(dim)
--> 558 return self.obj.map(func, dim=dim, **kwargs)
File ~/code/xarray/xarray/core/dataset.py:6924, in Dataset.map(self, func, keep_attrs, args, **kwargs)
6922 if keep_attrs is None:
6923 keep_attrs = _get_keep_attrs(default=False)
-> 6924 variables = {
6925 k: maybe_wrap_array(v, func(v, *args, **kwargs))
6926 for k, v in self.data_vars.items()
6927 }
6928 if keep_attrs:
6929 for k, v in variables.items():
File ~/code/xarray/xarray/core/dataset.py:6925, in <dictcomp>(.0)
6922 if keep_attrs is None:
6923 keep_attrs = _get_keep_attrs(default=False)
6924 variables = {
-> 6925 k: maybe_wrap_array(v, func(v, *args, **kwargs))
6926 for k, v in self.data_vars.items()
6927 }
6928 if keep_attrs:
6929 for k, v in variables.items():
File ~/code/xarray/xarray/core/weighted.py:286, in Weighted._weighted_mean(self, da, dim, skipna)
278 def _weighted_mean(
279 self,
280 da: T_DataArray,
281 dim: Dims = None,
282 skipna: bool | None = None,
283 ) -> T_DataArray:
284 """Reduce a DataArray by a weighted ``mean`` along some dimension(s)."""
--> 286 weighted_sum = self._weighted_sum(da, dim=dim, skipna=skipna)
288 sum_of_weights = self._sum_of_weights(da, dim=dim)
290 return weighted_sum / sum_of_weights
File ~/code/xarray/xarray/core/weighted.py:276, in Weighted._weighted_sum(self, da, dim, skipna)
268 def _weighted_sum(
269 self,
270 da: T_DataArray,
271 dim: Dims = None,
272 skipna: bool | None = None,
273 ) -> T_DataArray:
274 """Reduce a DataArray by a weighted ``sum`` along some dimension(s)."""
--> 276 return self._reduce(da, self.weights, dim=dim, skipna=skipna)
File ~/code/xarray/xarray/core/weighted.py:231, in Weighted._reduce(da, weights, dim, skipna)
227 da = da.fillna(0.0)
229 # `dot` does not broadcast arrays, so this avoids creating a large
230 # DataArray (if `weights` has additional dimensions)
--> 231 return dot(da, weights, dim=dim)
File ~/code/xarray/xarray/util/deprecation_helpers.py:140, in deprecate_dims.<locals>.wrapper(*args, **kwargs)
132 emit_user_level_warning(
133 "The `dims` argument has been renamed to `dim`, and will be removed "
134 "in the future. This renaming is taking place throughout xarray over the "
(...)
137 PendingDeprecationWarning,
138 )
139 kwargs["dim"] = kwargs.pop("dims")
--> 140 return func(*args, **kwargs)
File ~/code/xarray/xarray/core/computation.py:1885, in dot(dim, *arrays, **kwargs)
1883 dim = tuple(d for d, c in dim_counts.items() if c > 1)
1884 else:
-> 1885 dim = parse_dims(dim, all_dims=tuple(all_dims))
1887 dot_dims: set[Hashable] = set(dim)
1889 # dimensions to be parallelized
File ~/code/xarray/xarray/core/utils.py:1046, in parse_dims(dim, all_dims, check_exists, replace_none)
1044 dim = (dim,)
1045 if check_exists:
-> 1046 _check_dims(set(dim), set(all_dims))
1047 return tuple(dim)
File ~/code/xarray/xarray/core/utils.py:1131, in _check_dims(dim, all_dims)
1129 if wrong_dims:
1130 wrong_dims_str = ", ".join(f"'{d!s}'" for d in wrong_dims)
-> 1131 raise ValueError(
1132 f"Dimension(s) {wrong_dims_str} do not exist. Expected one or more of {all_dims}"
1133 )
ValueError: Dimension(s) 'y' do not exist. Expected one or more of {'x'}
Anything else we need to know?
No response
Environment
Newest main (i.e. 2024.01)
Related issues:
- #8583
- #7027
- #6952
Hello,
I started digging into this but kind of get lost into the code. So, I changed approach. I assume that the weighted reductions behaviours should be aligned as much as possible on the regular non-weighted reductions, which is currently not the case. So somehow, the logic has to be aligned, potentially refactored away? It seems this family of bugs (or better said, unexpected behaviours) root for diverging implementations and lack of delegation to default existing logic. Of course, this is easier said than done!
I have written a test describing common scenarios, comparing non-weighted (regular) reductions vs weighted reductions You can find the code of the test (test_non_weighted_vs_weighted_behaviour
) here for more details: https://github.com/etienneschalk/xarray/blob/ab0488caf3e81e37e02beeb709cf0c46f72a7c83/xarray/tests/test_weighted.py#L916 (:warning: this is work in progress, code not clean, experimenting with breaking things!)
Here is the matrix of behaviour summarizing the testing approach:
The "Weighted (expect)" column is identical to "Non-weighted (reference)", I put it to emphasizes that the behaviours should match, but maybe not 100% in case subtle unavoidable variations are discovered? For now I made the assumption that it should be a perfect match.
Scenario ID / Behaviour | Non-weighted (reference) | Weighted (expected) | Weighted (current) | Is current aligned on reference? |
---|---|---|---|---|
(1) DataArrayxr.DataArray(1) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:white_check_mark: |
(2) Dataset without reduction dimxr.Dataset({"var": 1}) |
:x: r"Dimensions ('x',) not found in data dimensions ()" (Handled on Dataset level) |
:x: r"Dimensions ('x',) not found in data dimensions ()" (Handled on Dataset level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: |
(3) Dataset with reduction dimxr.Dataset({"var": 1, "x_dependant": ("x", [2, 4])}) |
:white_check_mark: (Handled on Dataset level) |
:white_check_mark: (Handled on Dataset level) |
:x: r"'x' not found in array dimensions ()" (Handled on DataArray level) |
:x: |
So, according to this table, the issue is that the weighted behaviour for Dataset is not as specialized than the non-weighted one. Indeed, the Dataset non-weighted dimensions allow to ignore a reduction for a variable that does not depends on the reducing dimension. The only check seems to be that the reduced Dataset should at least contain one variable dependant on the reducing dimension. This "lenient" approach seems more user friendly as it is common to have Datasets with variables of various dimensions, so ignoring some of these variables simplifies the user experience, while reductions over DataArrays are more strict, since the user should know the dimensions of the DataArray that is being reduced (in opposition to plenty of variables in a Dataset).
The question now would be: how to align weighted reductions with their non-weighted counterparts, so that "UX choices" are made only once (for non-weighted that is the reference)?