polars icon indicating copy to clipboard operation
polars copied to clipboard

Nulls in categorical col cause mislabelled value counts

Open kevinheavey opened this issue 2 years ago • 1 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

When you call value_counts on a categorical col, the resulting is missing the null value and replaces it with another value, so you end up with something like this:

┌──────────────┐
│ job          │
│ ---          │
│ struct[2]    │
╞══════════════╡
│ {"waiter",1} │
│ {"doctor",1} │
│ {"doctor",3} │
└──────────────┘

Interestingly, if you call value_counts on the Series it works fine.

Reproducible example

import polars as pl

s = pl.Series(
            "job", ["doctor", "waiter", None, None, None], pl.Categorical
        )
df = pl.DataFrame([s])
print(
    df
    .select(pl.col("job").value_counts())
)

Expected behavior

┌──────────────┐
│ job          │
│ ---          │
│ struct[2]    │
╞══════════════╡
│ {"waiter",1} │
│ {"doctor",1} │
│ {null,3}     │
└──────────────┘

Installed versions

---Version info---
Polars: 0.15.11
Index type: UInt32
Platform: Linux-5.15.85-1-MANJARO-x86_64-with-glibc2.36
Python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.5.2
numpy: 1.23.5
fsspec: 2022.11.0
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.6.2

kevinheavey avatar Jan 03 '23 11:01 kevinheavey

Interestingly though, this sort of works:

import polars as pl

s = pl.Series("job", ["doctor", "waiter", None, None, None], pl.Categorical)

df = pl.DataFrame([s])

# This doesn't work
print(df.select(pl.col("job").value_counts()))

# This does
print(df.select(pl.col("job")).to_series().value_counts())

Output:

shape: (3, 1)
┌──────────────┐
│ job          │
│ ---          │
│ struct[2]    │
╞══════════════╡
│ {"doctor",1} │
│ {"doctor",3} │
│ {"waiter",1} │
└──────────────┘
shape: (3, 2)
┌────────┬────────┐
│ job    ┆ counts │
│ ---    ┆ ---    │
│ cat    ┆ u32    │
╞════════╪════════╡
│ doctor ┆ 1      │
│ null   ┆ 3      │
│ waiter ┆ 1      │
└────────┴────────┘

bolshoytoster avatar Jan 03 '23 13:01 bolshoytoster