polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(rust): truncate by calendar weeks

Open cannero opened this issue 2 years ago • 4 comments

When the 'w' duration parameter is used, truncate and round by calendar weeks instead of nano seconds. Breaking change if 'w' was used to truncate by weeks.

Closes #5557

cannero avatar Dec 09 '22 15:12 cannero

I think this works. Can you fix the python tests and see if those still make sense?

ritchie46 avatar Dec 10 '22 11:12 ritchie46

The change affects now only truncate or round by weeks. groupby_rolling still behaves like before, '7d' and '1w' result both in the same outcome.

cannero avatar Dec 14 '22 21:12 cannero

@cannero how come this doesn't affect the groupby_rolling/groupby_dynamic? They utilize the same duration computations right?

ritchie46 avatar Dec 15 '22 12:12 ritchie46

Should the groupby functions also group by calendar weeks? I just have to think if a mix of days and weeks at the period and offset values makes a problem.

Right now for the groupby to work the duration check in groupby.rs had to be changed from:

if offset.nanoseconds() >= period.nanoseconds() {

to:

if offset.duration_ns() >= period.duration_ns() {

I set the PR back to draft as I have to check another groupby_rolling example.

cannero avatar Dec 15 '22 19:12 cannero

@ritchie46 Right now groupby_rolling/groupby_dynamic uses the Duration::add_ns/us/ms, which works now also with weeks. Did you expect there other changes?

One question about the function to get the nanoseconds of a Duration, because I had to change it at some places. The nanoseconds function returns only the nanoseconds without the weeks or months part, duration_ns includes also the week and month part. Should we rename the nanoseconds function, so it is not used by accident? nanoseconds is still used in rolling_window, where it makes no problem now as only index counts are now allowed.

cannero avatar Dec 21 '22 21:12 cannero

One question about the function to get the nanoseconds of a Duration, because I had to change it at some places. The nanoseconds function returns only the nanoseconds without the weeks or months part, duration_ns includes also the week and month part. Should we rename the nanoseconds function, so it is not used by accident? nanoseconds is still used in rolling_window, where it makes no problem now as only index counts are now allowed.

Python has a total_seconds() method on timedelta objects to distinguish this case; perhaps we can take similar inspiration? (eg: total_(nano|micro|milli)seconds)

alexander-beedie avatar Dec 23 '22 11:12 alexander-beedie

Should we rename the nanoseconds function, so it is not used by accident? nanoseconds is still used in rolling_window, where it makes no problem now as only index counts are now allowed.

Yes, I think it makes sense to make it clear that you get access the nsecs attribute instead of the duration in nanoseconds. I think this can be done in a follow up PR.

ritchie46 avatar Dec 27 '22 09:12 ritchie46

Thanks for the contribution @cannero.

ritchie46 avatar Dec 27 '22 09:12 ritchie46