polars icon indicating copy to clipboard operation
polars copied to clipboard

PanicException when `with_context` is not used

Open StijnKas opened this issue 2 years ago • 0 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

When adding a context to a lazyframe, and then doing a groupby aggregation that doesn't use a column of this context, polars sometimes gives a PanicException.

Reproducible example

import polars as pl
globals = pl.DataFrame({'OtherCol':[1, 2, 3, 4]}).lazy()
df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy().with_context(globals)

df.groupby('Category').agg([(pl.col('Counts')).sum()+pl.col('OtherCol')]).collect() #this works

┌──────────┬───────────┐
│ Category ┆ Counts    │
│ ---      ┆ ---       │
│ i64      ┆ list[i64] │
╞══════════╪═══════════╡
│ 2        ┆ [10, 11]  │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1        ┆ [4, 5]    │
└──────────┴───────────┘


df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect() #this does not

thread '<unnamed>' panicked at 'cannot push more than 1 node', /Users/runner/work/polars/polars/polars/polars-lazy/polars-plan/src/utils.rs:66:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/var/folders/bn/mzxdq739615fffnwsnzkjkgjfplznp/T/ipykernel_36648/3111099779.py in <cell line: 4>()
      2 globals = pl.DataFrame({'OtherCol':[1, 2, 3, 4]}).lazy()
      3 df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy().with_context(globals)
----> 4 df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect()

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/utils.py in wrapper(*args, **kwargs)
    327         def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
    328             _rename_kwargs(fn.__name__, kwargs, aliases)
--> 329             return fn(*args, **kwargs)
    330 
    331         return wrapper

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
   1166             streaming,
   1167         )
-> 1168         return pli.wrap_df(ldf.collect())
   1169 
   1170     def sink_parquet(

PanicException: cannot push more than 1 node

Expected behavior

I would expect it to just ignore the context when those columns are not used, so the desired result would be this:

df = pl.DataFrame({'Category':[1, 1, 2, 2], 'Counts':[1, 2, 3, 4]}).lazy()
df.groupby('Category').agg([(pl.col('Counts')).sum()]).collect()
shape: (2, 2)
┌──────────┬────────┐
│ Category ┆ Counts │
│ ---      ┆ ---    │
│ i64      ┆ i64    │
╞══════════╪════════╡
│ 2        ┆ 7      │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1        ┆ 3      │
└──────────┴────────┘

Installed versions

---Version info---
Polars: 0.15.7
Index type: UInt32
Platform: macOS-12.6.1-x86_64-i386-64bit
Python: 3.10.7 (v3.10.7:6cc6b13308, Sep  5 2022, 14:02:52) [Clang 13.0.0 (clang-1300.0.29.30)]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.2.5
numpy: 1.23.3
fsspec: 2022.8.2
connectorx: 0.3.0
xlsx2csv: 0.8
matplotlib: 3.6.1```

</details>

StijnKas avatar Dec 20 '22 16:12 StijnKas