polars icon indicating copy to clipboard operation
polars copied to clipboard

fix(rust): us/ns TimeUnits downgraded to ms/us on arithmetic ops between Datetime/durations

Open Giuzzilla opened this issue 2 years ago • 9 comments

Fix for #12023

Note that this would change the current behavior which always "downgrades" the TimeUnit to either milliseconds or microseconds (The reported issue happens also when mixing microseconds with milliseconds)

Giuzzilla avatar Oct 28 '23 18:10 Giuzzilla

Currently failing tests:

  • test_fill_null_temporal (uses fill_null with 'ns' literals in 'ms' column and expects non-'ns') in test_temporal.py
  • test_lazy_query_10 (subtracts nanoseconds from milliseconds and expects milliseconds) in queries.rs

Giuzzilla avatar Oct 28 '23 18:10 Giuzzilla

Thanks for your PR!

test_fill_null_temporal

I think this test needs updating.

On the latest release we see:

In [4]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})

In [5]: df
Out[5]:
shape: (1, 1)
┌──────────────┐
│ ts           │
│ ---          │
│ datetime[ns] │
╞══════════════╡
│ null         │
└──────────────┘

In [6]: df.fill_null(datetime(2020, 1, 1))
Out[6]:
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘

which doesn't look right (why does fill_null upcast?), whereas on your branch one gets:

In [1]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})

In [2]: df.fill_null(datetime(2020, 1, 1))
Out[2]: 
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[ns]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘

which is what I'd have expected

So, I think it would be correct to update the failing tests here

MarcoGorelli avatar Oct 29 '23 18:10 MarcoGorelli

Thanks for your PR!

test_fill_null_temporal

I think this test needs updating.

On the latest release we see:

In [4]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})

In [5]: df
Out[5]:
shape: (1, 1)
┌──────────────┐
│ ts           │
│ ---          │
│ datetime[ns] │
╞══════════════╡
│ null         │
└──────────────┘

In [6]: df.fill_null(datetime(2020, 1, 1))
Out[6]:
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘

which doesn't look right (why does fill_null upcast?), whereas on your branch one gets:

In [1]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})

In [2]: df.fill_null(datetime(2020, 1, 1))
Out[2]: 
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[ns]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘

which is what I'd have expected

So, I think it would be correct to update the failing tests here

Updated both failing tests to make them pass

Giuzzilla avatar Oct 30 '23 21:10 Giuzzilla

@MarcoGorelli I will fiddle with it a bit more.

One simple way could be to get rid of (some?) parametrisation and be more verbose with more (explicit) tests.

Sadly when you mix these timeunits freely you need a bit more creativity, I couldn't find an easier way to get them on the same ground.

Giuzzilla avatar Oct 31 '23 11:10 Giuzzilla

@MarcoGorelli I've tried to simplify the tests further:

  • Made more explicit the mapping between the literals & the nanosecond conversion
  • Getting nanoseconds directly from duration instead of calculating it through the multiplier (so we can get rid of the multiplier dict altogether).
  • Made an explicit parameter for add-left vs add-right in the first test (less code repetition)
  • Got rid of made-up terms like "offset", which was indeed confusing

Giuzzilla avatar Oct 31 '23 13:10 Giuzzilla

For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental

Giuzzilla avatar Oct 31 '23 13:10 Giuzzilla

For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental

It's really good that you've included this, thanks!

I've just pushed a commit to further remove some logic from the test (including a hand-calculated expected_nanoseconds), and have used assert_frame_equal - fancy making a similar change to the test you added for subtraction between datetimes?

MarcoGorelli avatar Oct 31 '23 21:10 MarcoGorelli

For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental

It's really good that you've included this, thanks!

I've just pushed a commit to further remove some logic from the test (including a hand-calculated expected_nanoseconds), and have used assert_frame_equal - fancy making a similar change to the test you added for subtraction between datetimes?

Your changes look great. I've done the same thing on the subtraction of datetimes. I've kept the subtraction operation explicit in the parameters there just to make it clear where those numbers come from

Giuzzilla avatar Oct 31 '23 22:10 Giuzzilla

@ritchie46 gentle ping on this one (I think it should be ready, and fixes the linked issue nicely. The gist of the change is in crates/polars-core/src/utils/mod.rs, the rest is updating tests)

MarcoGorelli avatar Dec 08 '23 09:12 MarcoGorelli

Thanks @Giuzzilla but we want to keep the existing behavior. See https://github.com/pola-rs/polars/issues/12023#issuecomment-1962050436 for explanation.

stinodego avatar Feb 23 '24 21:02 stinodego