polars
polars copied to clipboard
Confusing (& wrong) behavior when using `with_columns` incorrectly
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
I accidentally wrote this code:
import polars as pl
df = pl.DataFrame({
'x1': [1,2,4,8,16,32],
'x2': [1,2,3,4,5,6]
})
df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())
>>>
shape: (6, 3)
┌─────┬─────┬───────────┐
│ x1 ┆ x2 ┆ pctChange │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═══════════╡
│ 1 ┆ 1 ┆ null │
│ 2 ┆ 2 ┆ 1.0 │
│ 4 ┆ 3 ┆ 0.5 │
│ 8 ┆ 4 ┆ 0.333333 │
│ 16 ┆ 5 ┆ 0.25 │
│ 32 ┆ 6 ┆ 0.2 │
└─────┴─────┴───────────┘
This is the result I'd expect if I were taking the pct_change of the x2
column, but it quietly ignores x1
.
Two behaviours seem appropiate to me:
- Raise an error when assigning a column using a dataframe
- Create a struct type column.
# Should behave like df.with_columns(pctChange = pl.struct(pl.col(['x1', 'x2']).pct_change()))
df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())
>>>
shape: (6, 3)
┌─────┬─────┬────────────────┐
│ x1 ┆ x2 ┆ pctChange │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ struct[2] │
╞═════╪═════╪════════════════╡
│ 1 ┆ 1 ┆ {null,null} │
│ 2 ┆ 2 ┆ {1.0,1.0} │
│ 4 ┆ 3 ┆ {1.0,0.5} │
│ 8 ┆ 4 ┆ {1.0,0.333333} │
│ 16 ┆ 5 ┆ {1.0,0.25} │
│ 32 ┆ 6 ┆ {1.0,0.2} │
└─────┴─────┴────────────────┘
In either case, the current behavior definitively violated the "don't surprise programmers" mantra.
Reproducible example
import polars as pl
df = pl.DataFrame({
'x1': [1,2,4,8,16,32],
'x2': [1,2,3,4,5,6]
})
df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())
Expected behavior
Should return the same as
import polars as pl
df = pl.DataFrame({
'x1': [1,2,4,8,16,32],
'x2': [1,2,3,4,5,6]
})
df.with_columns(pctChange = pl.struct(pl.col(['x1', 'x2']).pct_change()))
Or raise an error
Installed versions
---Version info---
Polars: 0.15.15
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
---Optional dependencies---
pyarrow: 8.0.0
pandas: 1.5.2
numpy: 1.22.4
fsspec: 2022.8.2
connectorx: 0.3.1
xlsx2csv: <not installed>
matplotlib: 3.6.2
@alexander-beedie could you take this one? This is related to the keyword argument assignment.
@mkleinbort-ic You can use the explicit alias()
until this is fixed.
I'm happy on my end, it's just a sharp corner I thought I'd raise
I'm happy on my end, it's just a sharp corner I thought I'd raise
@mkleinbort-ic: and many thanks for that - I've found a way to automatically structify this type of call (which does look like the right way to handle things), so the hoped-for behaviour should work by default in an upcoming release.
Update:
-
Note that the auto-structify behaviour is considered experimental, and requires opt-in via...
pl.Config.set_auto_structify(True)