pandas
pandas copied to clipboard
BUG: DatetimeIndex.is_year_start breaks on double-digit frequencies
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
dr = pd.date_range("2017-01-01", periods=2, freq="10YS")
print(dr.is_year_start)
Issue Description
This outputs
array([False, False])
Expected Behavior
array([True, True])
this absolute hack may be to blame
https://github.com/pandas-dev/pandas/blob/f2c8715245f3b1a5b55f144116f24221535414c6/pandas/_libs/tslibs/fields.pyx#L256-L257
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.9.final.0 python-bits : 64 OS : Linux OS-release : 5.15.146.1-microsoft-standard-WSL2 Version : #1 SMP Thu Jan 11 04:09:03 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 65.5.0 pip : 24.0 Cython : None pytest : 8.1.1 hypothesis : 6.100.1 sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.4 numba : 0.59.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
take
Replaced absolute string slicing into split to resolve this issue
# YearBegin(), BYearBegin() use month = starting month of year.
# QuarterBegin(), BQuarterBegin() use startingMonth = starting
# month of year. Other offsets use month, startingMonth as ending
# month of year.
period_str = "".join([dt_char for dt_char in list(freqstr.split("-")[0]) if not dt_char.isdigit()])
if (period_str in ["MS", "QS", "YS"]):
end_month = 12 if month_kw == 1 else month_kw - 1
start_month = month_kw
Gives me
[4/4] Linking target pandas/_libs/tslibs/fields.cpython-310-x86_64-linux-gnu.so
[ True True]
I don't think this is the solution, we need to go up some levels and look for something like freq.n
@natmokval you'd done a bunch of work in this area, so this might be interesting to you to try fixing?
I don't think this is the solution, we need to go up some levels and look for something like
freq.n
@natmokval you'd done a bunch of work in this area, so this might be interesting to you to try fixing?
Can you elaborate more on this one?
@natmokval you'd done a bunch of work in this area, so this might be interesting to you to try fixing?
Yeah, sure, I would like to work on this.
period_str = "".join([
dt_char for dt_char in list(freqstr.split("-")[0]) if not dt_char.isdigit()
])
Splits the freqstr
and removes the digits. So it can properly parse 10MS, 10YS-JAN, 2B etc., I believe it is more viable option to absolute slicing