polars icon indicating copy to clipboard operation
polars copied to clipboard

Allow list.eval to reference named columns

Open lucazanna opened this issue 2 years ago • 1 comments

Problem description

I wish arr.eval could reference names columns for easier filtering.

Here is an example:

df = pl.DataFrame({
    'a': [0,5,15],
    'a_list': [[0,5,15]]*3
})

shape: (3, 2)
┌─────┬────────────┐
│ a   ┆ a_list     │
│ --- ┆ ---        │
│ i64 ┆ list[i64]  │
╞═════╪════════════╡
│ 0   ┆ [0, 5, 15] │
│ 5   ┆ [0, 5, 15] │
│ 15  ┆ [0, 5, 15] │
└─────┴────────────┘

# Filtering the lists based on a column is however not possible with arr.eval

df.with_columns(
    list_higher_values = pl.col('a_list').list.eval(pl.element().filter(pl.element()> pl.col('a'))),
    list_all_values_except_current = pl.col('a_list').list.eval(pl.element().filter(pl.element() != pl.col('a')))
)

# this returns an error: ComputeError: named columns are not allowed in `list.eval`; consider using `element` or `col("")`

Is it possible to allow referencing of columns in list.eval ?

The other solution is to explode the dataframe, then group it back. However that adds some additional lines of codes.

What are your thoughts?

EDIT (marco): updating syntax

lucazanna avatar Feb 26 '23 22:02 lucazanna

I read a Stack Overflow question where referencing other columns in arr.eval could make for an easier synthax compared to groupby: https://stackoverflow.com/questions/76037097/polars-element-wise-list-operations-using-another-column

I imagine this might not give any performance increase, but it would be a nice 'quality of life' improvement

lucazanna avatar Apr 17 '23 16:04 lucazanna

This would be a great addition!

This being missing is the only reason I have to convert my quite large dataset to numpy for a certain step in my data pipeline at work currently.

I have a List[f64] column in which I have to set values at certain indices to missing (nan), and these indices depend on some arithmetic involving the corresponding value from another column..

I've been going in circles trying to implement this within the polars API, but keep running into this roadblock. If I have overlooked some way to do this, please do let me know! Otherwise I would be overjoyed if this feature could be implemented someday!

dashdeckers avatar Mar 04 '24 23:03 dashdeckers

I've been going in circles trying to implement this within the polars API, but keep running into this roadblock

For now, if you're feeling brave, you could try writing a plugin, it'll likely be easier than you think: https://marcogorelli.github.io/polars-plugins-tutorial/lists_in_lists_out/ . There's a "plugins" channel on the Polars discord where people are happy to help https://discord.gg/4UfP5cfBE7

MarcoGorelli avatar Mar 05 '24 10:03 MarcoGorelli

That is a fantastic tutorial. Count me inspired, thank you!

dashdeckers avatar Mar 08 '24 10:03 dashdeckers