polars
polars copied to clipboard
qcut throw PanicException when all None or nan
Checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import numpy as np
import polars as pl
df = pl.DataFrame({
'a': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
'b': [np.nan, np.nan, np.nan, np.nan, np.nan, 5, 6, 7, 8, 9],
})
print(df)
def func(df: pl.DataFrame):
df = df.with_columns([
pl.col('b').qcut(10).to_physical().alias('c')
])
return df
out = df.group_by('a', maintain_order=True).map_groups(func)
print(out)
thread '<unnamed>' panicked at D:\a\polars\polars\crates\polars-ops\src\series\ops\cut.rs:115:54:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
Traceback (most recent call last):
File "D:\GitHub\my_quant\tests\tt2.py", line 12, in func
df = df.with_columns([
^^^^^^^^^^^^^^^^^
File "D:\Users\Kan\miniconda3\envs\py311_1\Lib\site-packages\polars\dataframe\frame.py", line 7844, in with_columns
.collect(no_optimization=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\Kan\miniconda3\envs\py311_1\Lib\site-packages\polars\utils\deprecation.py", line 95, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\Kan\miniconda3\envs\py311_1\Lib\site-packages\polars\lazyframe\frame.py", line 1695, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value
Traceback (most recent call last):
File "D:\GitHub\my_quant\tests\tt2.py", line 18, in <module>
out = df.group_by('a', maintain_order=True).map_groups(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\Kan\miniconda3\envs\py311_1\Lib\site-packages\polars\dataframe\group_by.py", line 323, in map_groups
self.df._df.group_by_map_groups(by, function, self.maintain_order)
pyo3_runtime.PanicException: ('called `Option::unwrap()` on a `None` value',)
Issue description
qcut throw PanicException when all None or nan
Expected behavior
keep None or nan, and to_physical is -1
Installed versions
--------Version info---------
Polars: 0.19.0
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.11.3 | packaged by Anaconda, Inc. | (main, Apr 19 2023, 23:46:34) [MSC v.1916 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_sqlite: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fsspec: <not installed>
matplotlib: 3.7.1
numpy: 1.24.3
pandas: 2.0.1
pyarrow: 12.0.0
pydantic: 1.10.7
sqlalchemy: 2.0.13
xlsx2csv: <not installed>
xlsxwriter: <not installed>
I'm planning to tackle this in the rework of qcut
to bin_quantiles
(for more info see here: https://github.com/pola-rs/polars/issues/10468) by relying on a total ordering of the floats (https://doc.rust-lang.org/std/primitive.f64.html#method.total_cmp). That said, I think it would be wasted work to still fix this in the soon to be deprecated qcut
.
It also panics when the df is empty.
now on all Null columns instead of panic it gives an Error:
df = pl.DataFrame({"test": [None]})
df.with_columns(pl.col("test").qcut(5, labels=["q1", "q2", "q3", "q4", "q5"]))
File "/home/swang/.pyenv/versions/3.11.4/lib/python3.11/site-packages/polars/dataframe/frame.py", line 7872, in with_columns
return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/swang/.pyenv/versions/3.11.4/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1700, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
polars.exceptions.ShapeError: Provide nbreaks + 1 labels
Had anyone found a workaround this issue? I'm facing the same thing with a qcut().over() when all values are null:
pl.col(f).qcut(quantiles=3, labels=["0", "2", "4"], allow_duplicates=True).over("date")
After some investigation I found out that the panic happens when you provide labels to the qcut function while all data is null.
There's already a test for full null data, but it doesn't check with labels as input:
# this is the existing test
def test_qcut_full_null() -> None:
s = pl.Series("a", [None, None, None, None])
result = s.qcut([0.25, 0.50])
expected = pl.Series("a", [None, None, None, None], dtype=pl.Categorical)
assert_series_equal(result, expected, categorical_as_str=True)
# the new one - it fails
def test_qcut_full_null_with_labels() -> None:
s = pl.Series("a", [None, None, None, None])
result = s.qcut([0.25, 0.50], labels=["1", "2", "3"])
expected = pl.Series("a", [None, None, None, None], dtype=pl.Categorical)
assert_series_equal(result, expected, categorical_as_str=True)
The test_qcut_full_null_with_labels
fails due to the same error mentioned in this issue:
FAILED tests/unit/operations/test_qcut.py::test_qcut_full_null_with_labels - polars.exceptions.ShapeError: provide len(quantiles) + 1 labels
The spcific line out code that is causing the error is crates/polars-ops/src/series/ops/cut.rs#116
polars_ensure!(l.len() == breaks.len() + 1, ShapeMismatch: "provide len(quantiles) + 1 labels");
I'll try to fix it myself but I'm not really a rust guy.