polars icon indicating copy to clipboard operation
polars copied to clipboard

Access violation when grouped by a categorical column

Open hanjinliu opened this issue 2 months ago • 1 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import numpy as np

df = pl.DataFrame({"val": np.linspace(0,10, 21)})
cat = df["val"].cut([0, 2, 4, 6, 8, 10], as_series=True).alias("cat")
list(df.with_columns(cat).group_by(["cat"]))  # <-- access violation

This error happens here:

File "C:\Users\liuha\mambaforge\envs\mt\Lib\site-packages\polars\lazyframe\frame.py", line 1810 in collect
File "C:\Users\liuha\mambaforge\envs\mt\Lib\site-packages\polars\dataframe\group_by.py", line 105 in __iter__

Log output

no output

Issue description

Casting to string solves this problem.

cat = df["val"].cut([0, 2, 4, 6, 8, 10], as_series=True).alias("cat").cast(pl.String)

And I've checked that this did not happen with polars=0.20.22.

Expected behavior

This should not happen.

Installed versions

--------Version info---------
Polars:               0.20.23
Index type:           UInt32
Platform:             Windows-10-10.0.22621-SP0
Python:               3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 17:59:51) [MSC v.1935 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.3.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
nest_asyncio:         1.5.6
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.2.2
pyarrow:              11.0.0
pydantic:             1.10.15
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

hanjinliu avatar Apr 30 '24 23:04 hanjinliu

@nameexhaustion could you take a look at this one?

ritchie46 avatar May 01 '24 07:05 ritchie46