bastionlab
bastionlab copied to clipboard
Panic: polars, sort on a list
Repro:
1)
import polars as pl
from bastionlab.polars.policy import Policy, TrueRule, Log
repro_df = pl.DataFrame({ "hello": [1, 2, 3], "world": [1, 2, 3] })
repro_rdf = connection.client.polars.send_df(repro_df, policy=Policy(safe_zone=TrueRule(), unsafe_handling=Log(), savable=False))
(
repro_rdf
.groupby(pl.col("hello"))
.agg(pl.col("world")) # replace with .agg(pl.col("world")).sum() for it to work
.sort(pl.col("world"))
).collect().fetch()
results in
thread 'tokio-runtime-worker' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/cchudant/.cargo/registry/src/github.com-1ecc6299db9ec823/polars-core-0.25.1/src/series/series_trait.rs:427:9
This is inside polars, not sure how to proceed here
I believe the operation you're trying to perform does no really make sense: the aggregation step does not involve an aggregation function which means you get the list of values in the group for the "world" column, then you try to sort the same column which fails as polars doesn't know how to sort lists.
>>> import polars as pl
>>> df = pl.DataFrame({ "hello": [1,2,3], "world": [1,2,3]})
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).collect()
shape: (3, 2)
┌───────┬───────────┐
│ hello ┆ world │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═══════╪═══════════╡
│ 2 ┆ [2] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ [3] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ [1] │
└───────┴───────────┘
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).sort(pl.col("world")).collect()
thread '<unnamed>' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/runner/work/polars/polars/polars/polars-core/src/series/series_trait.rs:427:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dhalf/Documents/bastionlab/env/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 920, in collect
return pli.wrap_df(ldf.collect())
pyo3_runtime.PanicException: this operation is not implemented/valid for this dtype: List(Int64)
That said, we can probably improve error reporting.
Should i close this issue / reopen this in polars instead then?