polars
polars copied to clipboard
feat(python): Allow `pl.element()` in `Series.filter`
Closes #13772.
This allows a Series to self-reference via pl.element() in filter:
>>> pl.Series([1, 2, 3]).filter(pl.element() < 3)
shape: (2,)
Series: '' [i64]
[
1
2
]
Note that we now go entirely through the expression system for Series.filter, so I have removed the associated rust binding to the underlying Series.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 80.42%. Comparing base (
41d3048) to head (663d3f8). Report is 148 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #18270 +/- ##
==========================================
- Coverage 80.42% 80.42% -0.01%
==========================================
Files 1492 1492
Lines 198675 198666 -9
Branches 2841 2840 -1
==========================================
- Hits 159785 159772 -13
- Misses 38365 38369 +4
Partials 525 525
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This looks nicer! Thanks for accepting the suggestions!
@orlp can you help to review this PR? Thanks!
@MarcoGorelli Hi Marco, sorry to interrupt but can you help to review this PR? I think it is a valuable change, and it's a concise one which won't take much review time. I see this PR not getting reviewed for quite a few days, hence asking.
Is this PR actively reviewed? I see it hang for quite a while. Can the owners take a look? I think this is a valuable and concise change which won't take much review time.
This is not something we can support as is. This PR diverges the Series behavior from that of Expr.
I am also not a fan of how element is implemented on the Polars side and want to fix that first.
This is not something we can support as is. This PR diverges the
Seriesbehavior from that ofSeries.I am also not a fan of how
elementis implemented on the Polars side and want to fix that first.
Can you elaborate on the meaning of "diverges the Series behavior from that of Series" ?
I meant Expr. Edited.
I meant
Expr. Edited.
I'm still confused. Aren't they already "diverged" because their APIs are already quite different anyways? E.g. Expr.filter is for DataFrame and it works across multiple columns, which behaves more like DataFrame.filter:
import polars as pl
df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": [4, 5, 6],
}
)
print(df)
# shape: (3, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 4 │
# │ 2 ┆ 5 │
# │ 3 ┆ 6 │
# └─────┴─────┘
print(df.select(pl.col("a").filter(pl.col("b") == 5)))
# shape: (1, 1)
# ┌─────┐
# │ a │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 2 │
# └─────┘
while Series only has one column and the filter only works for that column. In my opinion, the change proposed by this PR indeed brings the two signatures closer, rather than "more diverged". The use of pl.element in Series.filter is also more aligned and consistent with how it is used in Series.list.eval. It is also more consistent with the high level API and structural design of polars: Series, in terms of elementwise operations, should be considered as a special case of an unnamed single-column DataFrame, with a flattened shape (i.e. (n, 1) => (n,) ). Isn't that the right way to unify most of the DataFrame <=> Series APIs? Based on these reasons, I think this PR promotes unification rather than divergence.
@ritchie46 Can you give an example to illustrate how it (this change) diverges Series.filter from Expr.filter, and how they are "not diverged" as of the current status?
@ritchie46 In terms of the current implementation of pl.element (a syntax sugar for an unnamed column): I understand that it's not ideal and it goes slightly beyond the scope of "an element of some name-agnostic 1D-array-like object (list, Series, etc)". Do you have any idea in mind to make it better and more strict?
@ritchie46 Can you give an example to illustrate how it (this change) diverges Series.filter from Expr.filter, and how they are "not diverged" as of the current status?
It doesn't work in expressions, but only in Series. I will close this as it isn't the way forward and the architecture of pl.element isn't correct yet.
Do you have any idea in mind to make it better and more strict?
I don't know yet, I am not working on this atm.