pandas icon indicating copy to clipboard operation
pandas copied to clipboard

CLN: enforce deprecation of frequencies deprecated for offsets

Open natmokval opened this issue 11 months ago • 11 comments

xref #52064, #55792, #55553, #55496 Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets. Now the aliases are case-sensitive.

P.S. Corrected a note in v3.0.0 related to PR #57627

natmokval avatar Mar 24 '24 22:03 natmokval

I enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets, ci - green. @MarcoGorelli could you please take a look at this PR?

natmokval avatar Mar 25 '24 16:03 natmokval

/preview

MarcoGorelli avatar Mar 31 '24 20:03 MarcoGorelli

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57986/

github-actions[bot] avatar Mar 31 '24 20:03 github-actions[bot]

Thanks for updating! A few things

* good to have the `c_PERIOD_TO_OFFSET_FREQSTR` dict, but I don't see why it's being used here:
  ```python
  if not is_period:
      if name.upper() in c_OFFSET_REMOVED_FREQSTR:
          raise ValueError(
              f"\'{name}\' is no longer supported for offsets. Please "
              f"use \'{c_OFFSET_REMOVED_FREQSTR.get(name.upper())}\' "
              f"instead."
          )
      # below we raise for lowrecase monthly and bigger frequencies
      if (name.upper() != name and
              name.lower() not in {"h", "min", "s", "ms", "us", "ns"} and
              name.upper() not in c_PERIOD_TO_OFFSET_FREQSTR and
              name.upper() in c_OFFSET_TO_PERIOD_FREQSTR):
          raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
  ```

  If you've already checked `if not is_period`, when why do you need to check if it's in `c_PERIOD_TO_OFFSET_FREQSTR `?

we need the check if it isn't in c_PERIOD_TO_OFFSET_FREQSTR, because we did not deprecate lowercase frequencies "d", "b", "w", "weekday", "w-sun”, and so on. We don't want to raise a ValueError for these frequencies for both offsets and period. After deprecating these frequencies we can remove the check (only uppercase will be correct).

for example without this check test_reindex_axes in pandas/tests/frame/methods/test_reindex.py raises a ValueError for freq = "d"

natmokval avatar Apr 12 '24 11:04 natmokval

* `lowrecase` typo 

thanks, I corrected the typo

natmokval avatar Apr 12 '24 11:04 natmokval

* is this part temporary
  ```python
    elif name in {"d", "b"}:
      name = name.upper()
  elif (name.upper() not in {"B", "D"} and
          not name.upper().startswith("W")):
  ```
  If so, could you add a comment explaining why it needs to be there, possibly linking to an open issue? Ideally we should get to the point where we can get rid of all this complexity, so let's make it clear what the road towards that endpoint is

Yes, it's the temporary part. I left the comment below.

natmokval avatar Apr 12 '24 11:04 natmokval

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

MarcoGorelli avatar Apr 12 '24 12:04 MarcoGorelli

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.? I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness. Oops, I forgot to add "WEEKDAY" in c_PERIOD_TO_OFFSET_FREQSTR, should I add it?

After enforcing the deprecation of "d", "b", "w", "weekday", "w-sun”, etc. we can make simplifications.

natmokval avatar Apr 12 '24 13:04 natmokval

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.? I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

MarcoGorelli avatar Apr 12 '24 15:04 MarcoGorelli

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.? I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

thanks, then maybe we can leave it as it is?

natmokval avatar Apr 12 '24 19:04 natmokval

cool, I think this is on the right track

I think there's a logic error somewhere, as it currently gives

In [7]: pd.period_range('2000', periods=3, freq='s')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File offsets.pyx:4853, in pandas._libs.tslibs.offsets.to_offset()

ValueError: 's' is not supported as period frequency.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 pd.period_range('2000', periods=3, freq='s')

File ~/pandas-dev/pandas/core/indexes/period.py:585, in period_range(start, end, periods, freq, name)
    582 if freq is None and (not isinstance(start, Period) and not isinstance(end, Period)):
    583     freq = "D"
--> 585 data, freq = PeriodArray._generate_range(start, end, periods, freq)
    586 dtype = PeriodDtype(freq)
    587 data = PeriodArray(data, dtype=dtype)

File ~/pandas-dev/pandas/core/arrays/period.py:321, in PeriodArray._generate_range(cls, start, end, periods, freq)
    318 periods = dtl.validate_periods(periods)
    320 if freq is not None:
--> 321     freq = Period._maybe_convert_freq(freq)
    323 if start is not None or end is not None:
    324     subarr, freq = _get_ordinal_range(start, end, periods, freq)

File period.pyx:1768, in pandas._libs.tslibs.period._Period._maybe_convert_freq()

File offsets.pyx:4914, in pandas._libs.tslibs.offsets.to_offset()

ValueError: Invalid frequency: s, failed to parse with error message: ValueError("'s' is not supported as period frequency.")

but 's' should definitely be supported here, right?

MarcoGorelli avatar May 02 '24 10:05 MarcoGorelli

Thanks for updating, looking better - still got a comment though, else dicts' responsibilities are being mixed

I like the good error messages you're giving here. Perhaps pd.date_range('2000', periods=2, freq='T') should also give a good error message, advising to use 'min'? (as a separate pr, though)

MarcoGorelli avatar May 29 '24 15:05 MarcoGorelli

thanks! this is probably good, will do another pass over tomorrow / in the week

MarcoGorelli avatar Jun 01 '24 17:06 MarcoGorelli

thanks! this is probably good, will do another pass over tomorrow / in the week

thank you for helping me with this PR!

natmokval avatar Jun 01 '24 17:06 natmokval

Thanks @natmokval

mroeschke avatar Jun 07 '24 17:06 mroeschke