xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Rolling mean with bool performs sum

Open chandley564 opened this issue 1 year ago • 2 comments
trafficstars

What happened?

Taking a rolling mean of a DataArray with dytpe=bool doesn't behave as I would expect. Rather than converting to int and taking the rolling mean the result is equivilent to converting to int then taking a rolling sum.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import numpy as np
from xarray import DataArray

int_raster = DataArray(
    data=[0, 1, 1, 0, 1, 0],
    dims=("x"),
)

expected_rolling_mean = DataArray(
    data=[np.nan, 2 / 3, 2 / 3, 2 / 3, 1 / 3, np.nan],
    dims=("x"),
)

bool_raster = int_raster.astype(bool)

int_rolling_mean = int_raster.rolling(x=3, center=True).mean()
bool_rolling_mean = bool_raster.rolling(x=3, center=True).mean()
rolling_sum = int_raster.rolling(x=3, center=True).sum()

print("Expected: \n", expected_rolling_mean, "\n")
print("With int dtype: \n", int_rolling_mean, "\n")
print("With bool dtype: \n", bool_rolling_mean, "\n")
print("Rolling sum: \n", rolling_sum)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Expected: 
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x 

With int dtype: 
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x 

With bool dtype: 
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x 

Rolling sum: 
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_New Zealand', '1252') libhdf5: None libnetcdf: None

xarray: 2024.2.0 pandas: 2.1.4 numpy: 1.26.2 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.3.1 distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: 2024.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 23.2.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.18.1 sphinx: 6.2.1

chandley564 avatar Mar 21 '24 23:03 chandley564

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Mar 21 '24 23:03 welcome[bot]

FWIW this seems to be correct under numbagg or bottleneck; so it's an issue with the naive xarray routines. We could just raise an error there.

Expected:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

With int dtype:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

With bool dtype:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

Rolling sum:
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x

max-sixty avatar Mar 22 '24 01:03 max-sixty