pandas BUG: In 1.5rc0 casting Series to datetime64 with specific but non-nanosecond units has no effect

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.testing import assert_series_equal
print(f"{pd.__version__=}")

ser = pd.Series(["2000-02-02", "2000-03-03"], dtype="datetime64[ns]")

# Passes on pandas 1.4.4
# Fails on pandas 1.5.0rc0
# Fails on pandas 1.6.0.dev0+136.gbbf17ea692
assert_series_equal(
    ser.astype("datetime64[Y]"),
    pd.Series(["2000-01-01", "2000-01-01"], dtype="datetime64[ns]")
)

Issue Description

In pandas 1.4.4 and earlier, it was possible to affect the contents of datetime64[ns] type Series by casting to datetime64 with other non-nanosecond units, even though the resulting Series still had datetime64[ns] as its type. As of 1.5rc0 this behavior seems to have changed. Casting to datetime64 with other units no longer seems to have any effect.

Based on comments in PR #48555 (closing issue #47844) referring to the numpy unit conversion it seems like this might not be the intended behavior, and it's a breaking change (we were relying on this behavior to turn month-start dates into the corresponding year-start dates). Snippet from that PR:

       elif (
            self.tz is None
            and is_datetime64_dtype(dtype)
            and dtype != self.dtype
            and is_unitless(dtype)
        ):
            # TODO(2.0): just fall through to dtl.DatetimeLikeArrayMixin.astype
            warnings.warn(
                "Passing unit-less datetime64 dtype to .astype is deprecated "
                "and will raise in a future version. Pass 'datetime64[ns]' instead",
                FutureWarning,
                stacklevel=find_stack_level(inspect.currentframe()),
            )
            # unit conversion e.g. datetime64[s]
            return self._ndarray.astype(dtype)

Expected Behavior

I expected the dates in the series to be adjusted to be consistent with the frequency of the datetime64 type used in astype(), as illustrated in the example above.

Installed Versions

INSTALLED VERSIONS

commit : bbf17ea692e437cec908eae6759ffff8092fb42e python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-47-generic Version : #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.6.0.dev0+136.gbbf17ea692 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 65.3.0 pip : 22.2.2 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Sep 15 '22 23:09 zaneselvans

cc @jbrockmendel thoughts here?

Sep 19 '22 20:09 phofl

Can fix for 1.5.1. For 2.0 we'll support [s, ms, us, ns], and i think astype to anything else will raise. long-term the user should use .floor i think

Sep 19 '22 21:09 jbrockmendel

Messing around with this a bit, Series.dt.floor() works well for Day, Hour, & Minute frequencies, but it doesn't allow snapping to the the start of years or months, so now I'm pulling out the numpy array and casting it to another frequency and constructing a new Series. It seems like there must be a better way I'm not aware of.

import numpy as np
import pandas as pd

df = (
    pd.DataFrame()
    .assign(
        hourly=pd.Series(np.arange("2000-01-01", "2003-01-01", dtype="datetime64[h]")),
        daily=lambda x: x.hourly.dt.floor("D"),
        monthly=lambda x: pd.Series(x["hourly"].to_numpy().astype("datetime64[M]")),
        yearly=lambda x: pd.Series(x["hourly"].to_numpy().astype("datetime64[Y]")),
    )
)

df.sample(10)

	hourly	daily	monthly	yearly
5280	2000-08-08 00:00:00	2000-08-08	2000-08-01	2000-01-01
6378	2000-09-22 18:00:00	2000-09-22	2000-09-01	2000-01-01
18764	2002-02-20 20:00:00	2002-02-20	2002-02-01	2002-01-01
25739	2002-12-08 11:00:00	2002-12-08	2002-12-01	2002-01-01
8285	2000-12-11 05:00:00	2000-12-11	2000-12-01	2000-01-01
16114	2001-11-02 10:00:00	2001-11-02	2001-11-01	2001-01-01
15167	2001-09-23 23:00:00	2001-09-23	2001-09-01	2001-01-01
3424	2000-05-22 16:00:00	2000-05-22	2000-05-01	2000-01-01
19729	2002-04-02 01:00:00	2002-04-02	2002-04-01	2002-01-01
3148	2000-05-11 04:00:00	2000-05-11	2000-05-01	2000-01-01

Sep 21 '22 01:09 zaneselvans

but it doesn't allow snapping to the the start of years or months

Good catch. I think we'd need something like pd.offsets.YearStart().rollback(obj) but for arrays instead of scalars xref #7449.

For 1.5.0 your best bet is, like you've found, doing the astype directly on the underlying numpy arrays.

For 1.5.1 we can restore that behavior.

Sep 21 '22 16:09 jbrockmendel

moving off 2.0 as ser.astype("datetime64[Y]") raises now anyway

Mar 27 '23 16:03 MarcoGorelli

Is there a plan to restore ser.astype("datetime64[Y]") in future versions? Or it will always raise?

Apr 17 '23 13:04 lingyielia

hey @lingyielia - this will continue to raise

are you trying to floor to the beginning of the year? If so, you could do

ser + pd.tseries.offsets.YearBegin() - pd.tseries.offsets.YearBegin()

In the future, it should be possible to do pd.tseries.offsets.YearBegin.rollbackward(ser)

Apr 17 '23 15:04 MarcoGorelli