polars
                                
                                 polars copied to clipboard
                                
                                    polars copied to clipboard
                            
                            
                            
                        fix(rust): us/ns TimeUnits downgraded to ms/us on arithmetic ops between Datetime/durations
Fix for #12023
Note that this would change the current behavior which always "downgrades" the TimeUnit to either milliseconds or microseconds (The reported issue happens also when mixing microseconds with milliseconds)
Currently failing tests:
- test_fill_null_temporal(uses- fill_nullwith 'ns' literals in 'ms' column and expects non-'ns') in- test_temporal.py
- test_lazy_query_10(subtracts nanoseconds from milliseconds and expects milliseconds) in- queries.rs
Thanks for your PR!
test_fill_null_temporal
I think this test needs updating.
On the latest release we see:
In [4]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})
In [5]: df
Out[5]:
shape: (1, 1)
┌──────────────┐
│ ts           │
│ ---          │
│ datetime[ns] │
╞══════════════╡
│ null         │
└──────────────┘
In [6]: df.fill_null(datetime(2020, 1, 1))
Out[6]:
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘
which doesn't look right (why does fill_null upcast?), whereas on your branch one gets:
In [1]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')})
In [2]: df.fill_null(datetime(2020, 1, 1))
Out[2]: 
shape: (1, 1)
┌─────────────────────┐
│ ts                  │
│ ---                 │
│ datetime[ns]        │
╞═════════════════════╡
│ 2020-01-01 00:00:00 │
└─────────────────────┘
which is what I'd have expected
So, I think it would be correct to update the failing tests here
Thanks for your PR!
test_fill_null_temporal
I think this test needs updating.
On the latest release we see:
In [4]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')}) In [5]: df Out[5]: shape: (1, 1) ┌──────────────┐ │ ts │ │ --- │ │ datetime[ns] │ ╞══════════════╡ │ null │ └──────────────┘ In [6]: df.fill_null(datetime(2020, 1, 1)) Out[6]: shape: (1, 1) ┌─────────────────────┐ │ ts │ │ --- │ │ datetime[μs] │ ╞═════════════════════╡ │ 2020-01-01 00:00:00 │ └─────────────────────┘which doesn't look right (why does fill_null upcast?), whereas on your branch one gets:
In [1]: df = pl.DataFrame({'ts': [None]}, schema={'ts': pl.Datetime('ns')}) In [2]: df.fill_null(datetime(2020, 1, 1)) Out[2]: shape: (1, 1) ┌─────────────────────┐ │ ts │ │ --- │ │ datetime[ns] │ ╞═════════════════════╡ │ 2020-01-01 00:00:00 │ └─────────────────────┘which is what I'd have expected
So, I think it would be correct to update the failing tests here
Updated both failing tests to make them pass
@MarcoGorelli I will fiddle with it a bit more.
One simple way could be to get rid of (some?) parametrisation and be more verbose with more (explicit) tests.
Sadly when you mix these timeunits freely you need a bit more creativity, I couldn't find an easier way to get them on the same ground.
@MarcoGorelli I've tried to simplify the tests further:
- Made more explicit the mapping between the literals & the nanosecond conversion
- Getting nanoseconds directly from duration instead of calculating it through the multiplier (so we can get rid of the multiplier dict altogether).
- Made an explicit parameter for add-left vs add-right in the first test (less code repetition)
- Got rid of made-up terms like "offset", which was indeed confusing
For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental
For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental
It's really good that you've included this, thanks!
I've just pushed a commit to further remove some logic from the test (including a hand-calculated expected_nanoseconds), and have used assert_frame_equal - fancy making a similar change to the test you added for subtraction between datetimes?
For the add-left vs add-right, I could get rid of it altogether and simplify further if you want. It's not that fundamental
It's really good that you've included this, thanks!
I've just pushed a commit to further remove some logic from the test (including a hand-calculated
expected_nanoseconds), and have usedassert_frame_equal- fancy making a similar change to the test you added for subtraction between datetimes?
Your changes look great. I've done the same thing on the subtraction of datetimes. I've kept the subtraction operation explicit in the parameters there just to make it clear where those numbers come from
@ritchie46 gentle ping on this one (I think it should be ready, and fixes the linked issue nicely. The gist of the change is in crates/polars-core/src/utils/mod.rs, the rest is updating tests)
Thanks @Giuzzilla but we want to keep the existing behavior. See https://github.com/pola-rs/polars/issues/12023#issuecomment-1962050436 for explanation.