polars
polars copied to clipboard
Access violation when grouped by a categorical column
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
import numpy as np
df = pl.DataFrame({"val": np.linspace(0,10, 21)})
cat = df["val"].cut([0, 2, 4, 6, 8, 10], as_series=True).alias("cat")
list(df.with_columns(cat).group_by(["cat"])) # <-- access violation
This error happens here:
File "C:\Users\liuha\mambaforge\envs\mt\Lib\site-packages\polars\lazyframe\frame.py", line 1810 in collect
File "C:\Users\liuha\mambaforge\envs\mt\Lib\site-packages\polars\dataframe\group_by.py", line 105 in __iter__
Log output
no output
Issue description
Casting to string solves this problem.
cat = df["val"].cut([0, 2, 4, 6, 8, 10], as_series=True).alias("cat").cast(pl.String)
And I've checked that this did not happen with polars=0.20.22
.
Expected behavior
This should not happen.
Installed versions
--------Version info---------
Polars: 0.20.23
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 17:59:51) [MSC v.1935 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: 2.2.1
connectorx: <not installed>
deltalake: <not installed>
fastexcel: <not installed>
fsspec: 2023.3.0
gevent: <not installed>
hvplot: <not installed>
matplotlib: 3.8.2
nest_asyncio: 1.5.6
numpy: 1.26.4
openpyxl: 3.1.2
pandas: 2.2.2
pyarrow: 11.0.0
pydantic: 1.10.15
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
@nameexhaustion could you take a look at this one?