pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: DateTimeIndex.is_year_start unexpected behavior when constructed with freq 'MS' date_range (#57377)

Open mattheeter opened this issue 1 year ago • 7 comments

  • [x] closes #57377

  • [x] Tests added and passed if fixing a bug or adding a new feature

  • [x] All code checks passed.

  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

  • A DateTimeIndex created via date_range with the freq parameter as 'MS' would yield a wrong is_year_start (and later it was discovered is_quarter_start) accessor. In pandas/_libs/tslibs/fields.pyx, the way in which the start and end months were set was one way for freq 'MS', 'QS', and 'YS' and another way for other offsets. However, only the latter two had the necessary properties to set start and end this way. As such, removing 'MS' from the list fixed the issue.

mattheeter avatar Feb 19 '24 00:02 mattheeter

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

github-actions[bot] avatar Mar 25 '24 00:03 github-actions[bot]

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

mroeschke avatar Mar 25 '24 18:03 mroeschke

wait sorry, I think this is my fault for not having got round to reviewing it

reopening for now, hoping to get to it at some point before 3.0

@natmokval fancy taking a look too?

MarcoGorelli avatar Mar 25 '24 18:03 MarcoGorelli

Thank you for working on this @mattheeter.

It looks good to me. I think the suggested changes will fix the bug in is_year_start for MonthBegin. We will get the correct result for freq="MS":

dr = pd.date_range("2017-12-01", periods=2, freq="MS")
print(dr) # DatetimeIndex(['2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='MS')
dr_attr = dr.is_year_start
print(dr_attr) # [False  True]

These changes won't fix a problem with some other frequencies such as: freq="QS-NOV":

dr = pd.date_range("2017-07-01", periods=2, freq="QS-NOV")
print(dr) # DatetimeIndex(['2017-08-01', '2017-11-01'], dtype='datetime64[ns]', freq='QS-NOV')
dr_comp = [dt.is_year_start for dt in dr]
dr_attr = dr.is_year_start
print(dr_attr) # [False  True]
print(dr_comp) # [False, False]

It may not be necessary to correct this in the current pull request.

natmokval avatar Mar 25 '24 20:03 natmokval

Of course! @natmokval

Yes, the original issue did not have the QS freq mentioned. I did point it out on the issue, and went ahead with what I thought would fix it there. I think that when I tested with other types of QS (e.g. November) I assumed that these should set the year start attribute to that of the quarter start, making me miss the fact that it was incorrect.

Should I continue to work on the QS bug, or just remove any changes that attempted to fix that one since the issue didn't reference it?

mattheeter avatar Mar 25 '24 22:03 mattheeter

I think it's okay to leave the fix for "MS" in this PR and open a new PR to work on the problem for frequencies such as "QS-NOV".

This PR is still Stale. It can take a couple of days to change its status to active.

natmokval avatar Mar 26 '24 22:03 natmokval

Hi @natmokval, I was wondering if you knew anything about the Numpy Dev tests that are failing? I know when I submitted this PR a while ago there were no issues with unit tests, but I just got some time to continue working on this but haven't made any code changes since.

mattheeter avatar Apr 22 '24 19:04 mattheeter