xarray icon indicating copy to clipboard operation
xarray copied to clipboard

cftime resampling error

Open dcherian opened this issue 1 year ago • 3 comments

What happened?

Something is very wrong with CFTime resampling for some inputs.

What did you expect to happen?

No error

Minimal Complete Verifiable Example

import dask.array
import numpy as np
import xarray as xr

ds = xr.Dataset(
    {"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},
    coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},
)
ds.resample(time="ME")
ValueError: Data shape (9,) must match shape of object (10,)

dcherian avatar Jun 12 '24 21:06 dcherian

Can you show the output of xr.show_versions()? I am actually not able to reproduce this in the two environments I've tried (one has xarray main installed):

>>> xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:50:49) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.4.3.dev863+gce196d56
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: None
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: None
sphinx: None

spencerkclark avatar Jun 13 '24 00:06 spencerkclark

Here are the versions:

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7

I get a bunch of these warnings too:

[/Users/deepak/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304](http://localhost:8888/lab/tree/repos/devel/xarray/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py#line=303): CFWarning: year=0 was specified - this date[/calendar/year](http://localhost:8888/calendar/year) zero convention is not supported by CF
  reference = type(date)(year, month, 1)

dcherian avatar Jun 13 '24 03:06 dcherian

Yeah, I get those warnings too. We may decide to do something to silence those, but I think that's a separate issue (not that it necessarily excuses it, but I think they have existed for a while for this case).

Weirdly I still cannot reproduce the ValueError with an environment built to be just like yours:

$ python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask.array; import numpy as np; import xarray as xr
>>> ds = xr.Dataset({"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},)
>>> ds.resample(time="ME")
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
  return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
  return CFTimeIndex(np.array(self) + other)
DatasetResample, grouped over '__resample_dim__'
1 groups with labels 0001-01-31, 00:00:00.
>>> ds.resample(time="ME").mean().compute()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
  return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
  return CFTimeIndex(np.array(self) + other)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
<xarray.Dataset> Size: 16B
Dimensions:  (time: 1)
Coordinates:
  * time     (time) object 8B 0001-01-31 00:00:00
Data variables:
    pr       (time) float64 8B 0.4763
>>> xr.show_versions()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7

This is the only diff in versions:

$ diff Deepak Spencer
4c4
< python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
---
> python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
7,9c7,9
< OS-release: 23.2.0
< machine: arm64
< processor: arm
---
> OS-release: 23.5.0
> machine: x86_64
> processor: i386

spencerkclark avatar Jun 13 '24 09:06 spencerkclark

Here's a minimal reproducer

import xarray as xr
from xarray.coding.cftime_offsets import MonthEnd
from xarray.core.resample_cftime import _get_time_bins

index = xr.date_range("0001-01-01", periods=10, freq="D")
datetime_bins, label = _get_time_bins(
    index,
    freq=MonthEnd(1),
    closed="right",
    label="right",
    origin="start_day",
    offset=None,
)
print(repr(datetime_bins[0]), '\n', repr(index[0]))
datetime_bins[0] < index[0] # should be True!

The output is

cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True) 
 cftime.DatetimeGregorian(1, 1, 1, 0, 0, 0, 0, has_year_zero=False)

The two have different has_year_zero and may be what's screwing up the comparison (np.searchsorted gets used later).

So if I choose to start with year 2 it's all fine.

dcherian avatar Jul 26 '24 21:07 dcherian

Thanks @dcherian—that diagnosis was super helpful. It makes sense that this is sort of an edge case, given that our original tests did not catch it. I think the issue is likely in here: https://github.com/pydata/xarray/blob/8c8d097816a70e35ef60de301503aa33f662857c/xarray/coding/cftime_offsets.py#L301-L321 In other words we need to ensure that we take whether has_year_zero is True or False into account when computing the new year when shifting months (this is what ends up creating the instance with has_year_zero=True in the example). When I get a chance I'll see if I can push a fix to #9116.

spencerkclark avatar Jul 30 '24 21:07 spencerkclark

Thanks! Why would this be OS-dependent though?

dcherian avatar Jul 31 '24 20:07 dcherian

There may be an underlying cftime issue related to comparison, e.g. this comparison is True on my machine, but False on yours:

>>> cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True) < cftime.DatetimeGregorian(1, 1, 1, has_year_zero=False)
<stdin>:1: CFWarning: this date/calendar/year zero convention is not supported by CF
True

but regardless I think generating dates with mismatched has_year_zero is an xarray bug.

spencerkclark avatar Jul 31 '24 20:07 spencerkclark