`.sel()` fails on `datetime64[s]` object
What happened?
Hi there,
sorry if this might be a duplicate, but I have been browsing the repo without finding anything specific which resemble this.
So, I am exploring to the possibility of calling xarray with CFDatetimeDecoder on time period overshooting pandas threshold year 2262
Running with xarray=2025.9.0
import xarray as xr
files="/ec/res4/scratch/ecme3497/ece4/pic2/output/oifs/pic2_atm_cmip6_1m_24*.nc"
coder = xr.coders.CFDatetimeCoder(time_unit='s')
data = xr.open_mfdataset(files, decode_times=coder)
data.time_counter
which gives me
array(['2400-01-16T12:00:00', '2400-02-15T12:00:00', '2400-03-16T12:00:00',
..., '2489-10-16T12:00:00', '2489-11-16T00:00:00',
'2489-12-16T12:00:00'], shape=(1080,), dtype='datetime64[s]')
so far so good. However
data.sel(time_counter=slice("2400-01-01", "2420-01-01"))
OverflowError Traceback (most recent call last)
File pandas/_libs/tslibs/period.pyx:1169, in pandas._libs.tslibs.period.period_ordinal_to_dt64()
OverflowError: Overflow occurred in npy_datetimestruct_to_datetime
The above exception was the direct cause of the following exception:
OutOfBoundsDatetime Traceback (most recent call last)
Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1)
----> [1](vscode-notebook-cell:?execution_count=8&line=1) data.sel(time_counter=slice("2400-01-01", "2420-01-01"))
File /ECMWF_kQYjfeo/miniforge/envs/env1/lib/python3.12/site-packages/xarray/core/dataset.py:2974, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
2906 """Returns a new dataset with each array indexed by tick labels
2907 along the specified dimension(s).
2908
(...) 2971
2972 """
2973 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> [2974](https://vscode-remote+ssh-002dremote-002bhpc-002dlogin.vscode-resource.vscode-cdn.net/ECMWF_kQYjfeo/miniforge/envs/env1/lib/python3.12/site-packages/xarray/core/dataset.py:2974) query_results = map_index_queries(
2975 self, indexers=indexers, method=method, tolerance=tolerance
2976 )
2978 if drop:
2979 no_scalar_variables = {}
...
File pandas/_libs/tslibs/period.pyx:1992, in pandas._libs.tslibs.period._Period.to_timestamp()
File pandas/_libs/tslibs/period.pyx:1172, in pandas._libs.tslibs.period.period_ordinal_to_dt64()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2400-01-01 00:00:00
I know this can be fixed using cftime, but I though that using datetime64[s] would have fix this issue. In general, I am quite confused on what are the currently supported features of xarray for going beyond the the nanoseconds limitation of pandas. Shall we keep working with cftime? Is there something I am missing? To what extent can we used the Coder if this is the output?
Any help, also suggestion to documentation, is greatly appreciated!
Thanks a lot
EDIT: I can upload some of the data required, but below you can find a reproducible example
What did you expect to happen?
As expected, correct time selection.
Minimal Complete Verifiable Example
import numpy as np
import xarray as xr
# monthly starts from 2389-01 to 2489-12 (inclusi)
months = np.arange("2389-01", "2490-01", dtype="datetime64[M]") # dtype M = month starts
times = months.astype("datetime64[s]") # convert to seconds (YYYY-MM-01T00:00:00)
# creare DataArray di esempio
da = xr.DataArray(np.zeros(times.size), coords={"time": times}, dims=["time"])
da.sel(time=slice("2400-01-01", "2420-01-01"))
I have been able to replicate this also in most recent version of xarray
>>> xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.14.0 | packaged by conda-forge | (main, Dec 2 2025, 20:23:19) [Clang 20.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 23.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: ('it_IT', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2025.11.0
pandas: 2.3.3
numpy: 2.3.5
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.3
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Environment
xarray>=2025.9.0
I can reproduce this with just
import pandas as pd
dates = pd.date_range("2400-01-15T12:00:00", freq="ME", periods=10, unit="s")
dates.slice_locs("2400-01-01", "2400-06-01")
which means we inherit this behavior from pandas.DatetimeIndex. Good news is that you can work around this by wrapping your timestamps in pd.Timestamp(t, unit="s").
cc @spencerkclark
Ah! that's curious! Actually this works too, no need to specify unit="s":
import numpy as np
import xarray as xr
import pandas as pd
months = np.arange("2389-01", "2490-01", dtype="datetime64[M]") # dtype M = month starts
times = months.astype("datetime64[s]") # convert to seconds (YYYY-MM-01T00:00:00)
da = xr.DataArray(np.zeros(times.size), coords={"time": times}, dims=["time"])
da.sel(time=slice(pd.Timestamp("2400-01-01"), pd.Timestamp("2420-01-01"))).time
Correct, this is an existing issue in pandas: https://github.com/pandas-dev/pandas/issues/56940. You could consider pinging it to get a sense for where it is on their roadmap.
It may not be a concern for your use-case, but I would offer a slight word of caution regarding using the pd.Timestamp workaround, since it has a different meaning than indexing with strings (it is not an exact drop-in replacement). With strings, pandas will implicitly expand the bounds to encompass the largest possible range of strings, while with pd.Timestamp objects, pandas will use the literal values as bounds. See this section of the pandas documentation for more details.
To the larger question of full non-nanosecond np.datetime64 support, indeed this is still a work in progress in pandas. There is a general tracking issue here: https://github.com/pandas-dev/pandas/issues/46587. A large number of the issues are resolved, though I am not sure how comprehensive or up-to-date it is (e.g., it does not seem to include this issue). In the bigger picture, cftime support is not being deprecated. Independent of the time range issue, there is still a need for non-standard calendar support, so if that supports the features you require, it is safe to continue using that.
Correct, this is an existing issue in pandas: pandas-dev/pandas#56940. You could consider pinging it to get a sense for where it is on their roadmap.
Thanks will do!
It may not be a concern for your use-case, but I would offer a slight word of caution regarding using the
pd.Timestampworkaround, since it has a different meaning than indexing with strings (it is not an exact drop-in replacement). With strings, pandas will implicitly expand the bounds to encompass the largest possible range of strings, while withpd.Timestampobjects, pandas will use the literal values as bounds. See this section of the pandas documentation for more details.
So this why from the xarray side you do not force conversion toward pd.Timestamp and you get this error. At first I understood this was due to some missing parsing on your side but now I see
To the larger question of full non-nanosecond
np.datetime64support, indeed this is still a work in progress in pandas. There is a general tracking issue here: pandas-dev/pandas#46587. A large number of the issues are resolved, though I am not sure how comprehensive or up-to-date it is (e.g., it does not seem to include this issue). In the bigger picture,cftimesupport is not being deprecated. Independent of the time range issue, there is still a need for non-standard calendar support, so if that supports the features you require, it is safe to continue using that.
In our case we https://github.com/DestinE-Climate-DT/AQUA we rely on pandas for time operations, so that using cftime is not our first choice, but thanks for the comprehensive explanation.
Shall I close the issue or do you want to leave it as backlog?
Thanks, I see, that makes sense. You are welcome to keep this issue open, since I suspect others may run into this same problem and may wonder about the status / underlying cause.
In doing a little more investigation on the pandas side, I think addressing https://github.com/pandas-dev/pandas/issues/28104, which is on the general checklist, might help solve this one as well (though I think there would need to be a way to propagate the resolution for determining the appropriate end bound).