cftime resampling error
What happened?
Something is very wrong with CFTime resampling for some inputs.
What did you expect to happen?
No error
Minimal Complete Verifiable Example
import dask.array
import numpy as np
import xarray as xr
ds = xr.Dataset(
{"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},
coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},
)
ds.resample(time="ME")
ValueError: Data shape (9,) must match shape of object (10,)
Can you show the output of xr.show_versions()? I am actually not able to reproduce this in the two environments I've tried (one has xarray main installed):
>>> xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:50:49) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2023.4.3.dev863+gce196d56
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: None
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: None
sphinx: None
Here are the versions:
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7
I get a bunch of these warnings too:
[/Users/deepak/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304](http://localhost:8888/lab/tree/repos/devel/xarray/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py#line=303): CFWarning: year=0 was specified - this date[/calendar/year](http://localhost:8888/calendar/year) zero convention is not supported by CF
reference = type(date)(year, month, 1)
Yeah, I get those warnings too. We may decide to do something to silence those, but I think that's a separate issue (not that it necessarily excuses it, but I think they have existed for a while for this case).
Weirdly I still cannot reproduce the ValueError with an environment built to be just like yours:
$ python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask.array; import numpy as np; import xarray as xr
>>> ds = xr.Dataset({"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},)
>>> ds.resample(time="ME")
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
return CFTimeIndex(np.array(self) + other)
DatasetResample, grouped over '__resample_dim__'
1 groups with labels 0001-01-31, 00:00:00.
>>> ds.resample(time="ME").mean().compute()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
return CFTimeIndex(np.array(self) + other)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
<xarray.Dataset> Size: 16B
Dimensions: (time: 1)
Coordinates:
* time (time) object 8B 0001-01-31 00:00:00
Data variables:
pr (time) float64 8B 0.4763
>>> xr.show_versions()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7
This is the only diff in versions:
$ diff Deepak Spencer
4c4
< python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
---
> python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
7,9c7,9
< OS-release: 23.2.0
< machine: arm64
< processor: arm
---
> OS-release: 23.5.0
> machine: x86_64
> processor: i386
Here's a minimal reproducer
import xarray as xr
from xarray.coding.cftime_offsets import MonthEnd
from xarray.core.resample_cftime import _get_time_bins
index = xr.date_range("0001-01-01", periods=10, freq="D")
datetime_bins, label = _get_time_bins(
index,
freq=MonthEnd(1),
closed="right",
label="right",
origin="start_day",
offset=None,
)
print(repr(datetime_bins[0]), '\n', repr(index[0]))
datetime_bins[0] < index[0] # should be True!
The output is
cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True)
cftime.DatetimeGregorian(1, 1, 1, 0, 0, 0, 0, has_year_zero=False)
The two have different has_year_zero and may be what's screwing up the comparison (np.searchsorted gets used later).
So if I choose to start with year 2 it's all fine.
Thanks @dcherian—that diagnosis was super helpful. It makes sense that this is sort of an edge case, given that our original tests did not catch it. I think the issue is likely in here: https://github.com/pydata/xarray/blob/8c8d097816a70e35ef60de301503aa33f662857c/xarray/coding/cftime_offsets.py#L301-L321
In other words we need to ensure that we take whether has_year_zero is True or False into account when computing the new year when shifting months (this is what ends up creating the instance with has_year_zero=True in the example). When I get a chance I'll see if I can push a fix to #9116.
Thanks! Why would this be OS-dependent though?
There may be an underlying cftime issue related to comparison, e.g. this comparison is True on my machine, but False on yours:
>>> cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True) < cftime.DatetimeGregorian(1, 1, 1, has_year_zero=False)
<stdin>:1: CFWarning: this date/calendar/year zero convention is not supported by CF
True
but regardless I think generating dates with mismatched has_year_zero is an xarray bug.