xarray icon indicating copy to clipboard operation
xarray copied to clipboard

.min() doesn't work on np.datetime64 with a chunked Dataset

Open ludwigVonKoopa opened this issue 4 years ago • 4 comments

Hi all,

if a xr.Dataset is chunked, i cannot do ds.time.min(), i get an error : ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]'). I don't know if it is expected ? Moreover, ds2.time.mean() works

Thanks

What happened:

raised an UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

What you expected to happen:

compute the min & max on a chunked datetime64 xarray.DataArray

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np

obs=200
t0 = np.datetime64("2010-01-01T00:00:00")
tn = t0 + np.timedelta64(123*4, "D")

ds2 = xr.Dataset(
    {
        "time": (["obs"], np.arange(t0, tn, (tn-t0)/obs)),
    },
    coords={
        "obs": (["obs"], np.arange(obs)),
    },
).chunk({"obs": 100})

ds2.time.min()

Anything else we need to know?:

ds2.time.mean() works, max & min raise Exception

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: 6.2.2 IPython: 7.20.0 sphinx: 3.5.0

ludwigVonKoopa avatar Mar 05 '21 11:03 ludwigVonKoopa

core.duck_array_ops.mean seems to have a custom wrapper for datetime arrays. It should not be a problem to generalize this to min and max as well. Maybe there a more generic wrapper would be the best solution?

headtr1ck avatar May 01 '22 15:05 headtr1ck

Yeah that's a good idea. We should check whether dask & numpy supports this now.

dcherian avatar May 01 '22 16:05 dcherian

Yes, adding a custom wrapper derived from core.duck_array_ops.mean sounds like a good approach. It would allow handling datetime arrays for operators other than .mean().

I ran into the same issue when applying .median() or .str() to an xarray.core.resample.DataSetResample object. In my case, I need to apply these operations to the dataset variables while still keeping meaningful values for the time coordinate — for example, retaining a representative timestamp (such as the mean time) even when using .median() or .str() on the variables.

I experimented with building a wrapper similar to core.duck_array_ops.mean, and it works for my use case:

# median = _create_nan_agg_method("median", invariant_0d=True)
# median.numeric_only = True
_median = _create_nan_agg_method("median", invariant_0d=True)

def median(array, axis=None, skipna=None, **kwargs):
    """inhouse median that can handle np.datetime64 or cftime.datetime
    dtypes"""
    from xarray.core.common import _contains_cftime_datetimes
    array = asarray(array)
    if dtypes.is_datetime_like(array.dtype):
        dmin = _datetime_nanreduce(array, min).astype("datetime64[Y]").astype(int)
        dmax = _datetime_nanreduce(array, max).astype("datetime64[Y]").astype(int)
        offset = (
            np.array((dmin + dmax) // 2).astype("datetime64[Y]").astype(array.dtype)
        )
        # From version 2025.01.2 xarray uses np.datetime64[unit], where unit
        # is one of "s", "ms", "us", "ns".
        # To not have to worry about the resolution, we just convert the output
        # to "timedelta64" (without unit) and let the dtype of offset take precedence.
        # This is fully backwards compatible with datetime64[ns].
        return (
            _median(
                datetime_to_numeric(array, offset), axis=axis, skipna=skipna, **kwargs
            ).astype("timedelta64")
            + offset
        )
    elif _contains_cftime_datetimes(array):
        offset = min(array)
        timedeltas = datetime_to_numeric(array, offset, datetime_unit="us")
        mean_timedeltas = _median(timedeltas, axis=axis, skipna=skipna, **kwargs)
        return _to_pytimedelta(mean_timedeltas, unit="us") + offset
    else:
        return _median(array, axis=axis, skipna=skipna, **kwargs)


median.numeric_only = True  # type: ignore[attr-defined]

I’m not a very advanced Python programmer, so I’m sure there are cleaner and more robust/generic ways to solve this. But I hope this example helps illustrate the need for a general wrapper that supports datetime handling for other reduction operations for xarray.core.resample.DataArrayResample or xarray.core.groupby.DatasetGroupBy objects.

schromain avatar Nov 26 '25 14:11 schromain

For min and max it's possible to add support within dask. They shouldn't require special handling on the xarray side.

jsignell avatar Dec 04 '25 19:12 jsignell

I think this can be closed now that the dask PR is in and released.

jsignell avatar Dec 18 '25 22:12 jsignell