polars
polars copied to clipboard
PanicException when `with_context` is not used
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
When adding a context to a lazyframe, and then doing a groupby aggregation that doesn't use a column of this context, polars sometimes gives a PanicException.
Reproducible example
import polars as pl
globals = pl.DataFrame({'OtherCol':[1, 2, 3, 4]}).lazy()
df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy().with_context(globals)
df.groupby('Category').agg([(pl.col('Counts')).sum()+pl.col('OtherCol')]).collect() #this works
┌──────────┬───────────┐
│ Category ┆ Counts │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞══════════╪═══════════╡
│ 2 ┆ [10, 11] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ [4, 5] │
└──────────┴───────────┘
df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect() #this does not
thread '<unnamed>' panicked at 'cannot push more than 1 node', /Users/runner/work/polars/polars/polars/polars-lazy/polars-plan/src/utils.rs:66:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
/var/folders/bn/mzxdq739615fffnwsnzkjkgjfplznp/T/ipykernel_36648/3111099779.py in <cell line: 4>()
2 globals = pl.DataFrame({'OtherCol':[1, 2, 3, 4]}).lazy()
3 df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy().with_context(globals)
----> 4 df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect()
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/utils.py in wrapper(*args, **kwargs)
327 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
328 _rename_kwargs(fn.__name__, kwargs, aliases)
--> 329 return fn(*args, **kwargs)
330
331 return wrapper
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
1166 streaming,
1167 )
-> 1168 return pli.wrap_df(ldf.collect())
1169
1170 def sink_parquet(
PanicException: cannot push more than 1 node
Expected behavior
I would expect it to just ignore the context when those columns are not used, so the desired result would be this:
df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy()
df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect()
shape: (2, 2)
┌──────────┬────────┐
│ Category ┆ Counts │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════════╪════════╡
│ 2 ┆ 7 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 3 │
└──────────┴────────┘
Installed versions
---Version info---
Polars: 0.15.7
Index type: UInt32
Platform: macOS-12.6.1-x86_64-i386-64bit
Python: 3.10.7 (v3.10.7:6cc6b13308, Sep 5 2022, 14:02:52) [Clang 13.0.0 (clang-1300.0.29.30)]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.2.5
numpy: 1.23.3
fsspec: 2022.8.2
connectorx: 0.3.0
xlsx2csv: 0.8
matplotlib: 3.6.1```
</details>