BUG: Inconsistent resolution from constructing with and assigning datetime values
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from datetime import datetime
import pandas as pd
construct = pd.DataFrame([datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)])
print(construct.info())
assign = pd.DataFrame([], index=[0])
assign["time"] = datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
print(assign.info())
assign_timestamp = pd.DataFrame([], index=[0])
assign_timestamp["time"] = pd.Timestamp(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
print(assign_timestamp.info())
assign_loc = pd.DataFrame([], index=[0])
assign_loc.loc[0, "time"] = datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
print(assign_loc.info())
Issue Description
Hi! I'm faced with a problem about datetime columns. It was met in our prod env upgrade from pandas 1.5.3 to 2.1.4. Before that constructing and assigning both result in datetime64[ns]. But asof 2.1.4, the following results were get:
print(construct.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 1 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 140.0 bytes
None
print(assign.info())
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 1 non-null datetime64[us]
dtypes: datetime64[us](1)
memory usage: 16.0 bytes
None
print(assign_timestamp.info())
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 1 non-null datetime64[us]
dtypes: datetime64[us](1)
memory usage: 16.0 bytes
None
print(assign_loc.info())
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 1 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 124.0 bytes
None
While construction and loc assignment gives the dtype datetime64[ns], df["time"] = datetime(...) results in datetime64[us].
I tested in a different env using pandas 2.2.0. The result is different when using loc:
print(assign_loc.info())
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 1 non-null datetime64[us]
dtypes: datetime64[us](1)
memory usage: 124.0 bytes
None
This time assign_loc gives datetime64[us], but construct still gives datetime64[ns].
I've noticed in the release notes that changes were made to astype, making it parse inputs according to the original resolution, but it might not explain the inconsistency intuitively. This is causing massive concat errors in our code due to inconsistent dtypes, and we are truly concerned about it as we are not sure if this is a bug and what behavior will be stabilized. We feel like further confirming it.
Thank you so much and wish you all the best!
Expected Behavior
Consistent datetime resolution
Installed Versions
INSTALLED VERSIONS
commit : a671b5a8bf5dd13fb19f0e88edc679bc9e15c673 python : 3.11.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 151 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 2.1.4 numpy : 1.26.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : 7.4.0 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.3 html5lib : None pymysql : 1.0.2 psycopg2 : None jinja2 : 3.1.2 IPython : 8.20.0 pandas_datareader : None bs4 : 4.12.2 bottleneck : 1.3.5 dataframe-api-compat: None fastparquet : 2023.8.0 fsspec : 2023.10.0 gcsfs : None matplotlib : 3.8.0 numba : 0.58.1 numexpr : 2.8.7 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : 2.0.25 tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
TEST 2.2.0:
INSTALLED VERSIONS
commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.12.0.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 151 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 2.2.0 numpy : 1.26.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
Bisected this issue, regression happened on this commit:
a2bb939ebd3aa9d78f96a72fa575f67b076f0bfa is the first bad commit
commit a2bb939ebd3aa9d78f96a72fa575f67b076f0bfa
Author: jbrockmendel <[email protected]>
Date: Thu May 18 09:11:48 2023 -0700
API/BUG: infer_dtype_from_scalar with non-nano (#52212)
* API/BUG: infer_dtype_from_scalar with non-nano
* update test
* xfail on 32bit
* fix xfail condition
* whatsnew
* xfail on windows
File used for bisecting
# t.py file
from datetime import datetime
import pandas as pd
construct = pd.DataFrame([datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)])
assert str(construct.iloc[:,0].dtype) == 'datetime64[ns]'
assign = pd.DataFrame([], index=[0])
assign["time"] = datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
assert str(assign.iloc[:,0].dtype) == 'datetime64[ns]'
assign_timestamp = pd.DataFrame([], index=[0])
assign_timestamp["time"] = pd.Timestamp(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
assert str(assign_timestamp.iloc[:,0].dtype) == 'datetime64[ns]'
assign_loc = pd.DataFrame([], index=[0])
assign_loc.loc[0, "time"] = datetime(year=2024, month=1, day=26, hour=0, minute=0, second=0, microsecond=0)
assert str(assign_loc.iloc[:,0].dtype) == 'datetime64[ns]'
Before that all dtypes are datetime64[ns]
After:
>>> print(pd.__version__)
2.1.0.dev0+797.ga2bb939ebd.dirty
>>> str(construct.iloc[:,0].dtype)
'datetime64[ns]'
>>> str(assign.iloc[:,0].dtype)
'datetime64[us]'
>>> str(assign_timestamp.iloc[:,0].dtype)
'datetime64[us]'
>>> str(assign_loc.iloc[:,0].dtype)
'datetime64[ns]'
Also, I think the ns and us resolutions depend on whether we are assigning a scalar or a list. Refencing https://github.com/pandas-dev/pandas/issues/55014#issuecomment-1764341895
cc @jbrockmendel
Should be addressed by #55901