polars
polars copied to clipboard
Allow list.eval to reference named columns
Problem description
I wish arr.eval could reference names columns for easier filtering.
Here is an example:
df = pl.DataFrame({
'a': [0,5,15],
'a_list': [[0,5,15]]*3
})
shape: (3, 2)
┌─────┬────────────┐
│ a ┆ a_list │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪════════════╡
│ 0 ┆ [0, 5, 15] │
│ 5 ┆ [0, 5, 15] │
│ 15 ┆ [0, 5, 15] │
└─────┴────────────┘
# Filtering the lists based on a column is however not possible with arr.eval
df.with_columns(
list_higher_values = pl.col('a_list').list.eval(pl.element().filter(pl.element()> pl.col('a'))),
list_all_values_except_current = pl.col('a_list').list.eval(pl.element().filter(pl.element() != pl.col('a')))
)
# this returns an error: ComputeError: named columns are not allowed in `list.eval`; consider using `element` or `col("")`
Is it possible to allow referencing of columns in list.eval ?
The other solution is to explode the dataframe, then group it back. However that adds some additional lines of codes.
What are your thoughts?
EDIT (marco): updating syntax
I read a Stack Overflow question where referencing other columns in arr.eval could make for an easier synthax compared to groupby:
https://stackoverflow.com/questions/76037097/polars-element-wise-list-operations-using-another-column
I imagine this might not give any performance increase, but it would be a nice 'quality of life' improvement
This would be a great addition!
This being missing is the only reason I have to convert my quite large dataset to numpy for a certain step in my data pipeline at work currently.
I have a List[f64] column in which I have to set values at certain indices to missing (nan), and these indices depend on some arithmetic involving the corresponding value from another column..
I've been going in circles trying to implement this within the polars API, but keep running into this roadblock. If I have overlooked some way to do this, please do let me know! Otherwise I would be overjoyed if this feature could be implemented someday!
I've been going in circles trying to implement this within the polars API, but keep running into this roadblock
For now, if you're feeling brave, you could try writing a plugin, it'll likely be easier than you think: https://marcogorelli.github.io/polars-plugins-tutorial/lists_in_lists_out/ . There's a "plugins" channel on the Polars discord where people are happy to help https://discord.gg/4UfP5cfBE7
That is a fantastic tutorial. Count me inspired, thank you!