pandas
pandas copied to clipboard
BUG: DateTimeIndex.is_year_start unexpected behavior when constructed with freq 'MS' date_range (#57377)
-
[x] closes #57377
-
[x] Tests added and passed if fixing a bug or adding a new feature
-
[x] All code checks passed.
-
[x] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature. -
A DateTimeIndex created via date_range with the freq parameter as 'MS' would yield a wrong is_year_start (and later it was discovered is_quarter_start) accessor. In
pandas/_libs/tslibs/fields.pyx
, the way in which the start and end months were set was one way for freq 'MS', 'QS', and 'YS' and another way for other offsets. However, only the latter two had the necessary properties to set start and end this way. As such, removing 'MS' from the list fixed the issue.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.
wait sorry, I think this is my fault for not having got round to reviewing it
reopening for now, hoping to get to it at some point before 3.0
@natmokval fancy taking a look too?
Thank you for working on this @mattheeter.
It looks good to me. I think the suggested changes will fix the bug in is_year_start
for MonthBegin
. We will get the correct result for freq="MS"
:
dr = pd.date_range("2017-12-01", periods=2, freq="MS")
print(dr) # DatetimeIndex(['2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='MS')
dr_attr = dr.is_year_start
print(dr_attr) # [False True]
These changes won't fix a problem with some other frequencies such as: freq="QS-NOV"
:
dr = pd.date_range("2017-07-01", periods=2, freq="QS-NOV")
print(dr) # DatetimeIndex(['2017-08-01', '2017-11-01'], dtype='datetime64[ns]', freq='QS-NOV')
dr_comp = [dt.is_year_start for dt in dr]
dr_attr = dr.is_year_start
print(dr_attr) # [False True]
print(dr_comp) # [False, False]
It may not be necessary to correct this in the current pull request.
Of course! @natmokval
Yes, the original issue did not have the QS freq mentioned. I did point it out on the issue, and went ahead with what I thought would fix it there. I think that when I tested with other types of QS (e.g. November) I assumed that these should set the year start attribute to that of the quarter start, making me miss the fact that it was incorrect.
Should I continue to work on the QS bug, or just remove any changes that attempted to fix that one since the issue didn't reference it?
I think it's okay to leave the fix for "MS"
in this PR and open a new PR to work on the problem for frequencies such as "QS-NOV"
.
This PR is still Stale
. It can take a couple of days to change its status to active.
Hi @natmokval, I was wondering if you knew anything about the Numpy Dev tests that are failing? I know when I submitted this PR a while ago there were no issues with unit tests, but I just got some time to continue working on this but haven't made any code changes since.