polars
polars copied to clipboard
Enum literals evaluate to equal for CSE when one dtype is subset of another
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Issue description
If two enums are used in a collect
with CSE enabled, then their literals will evaluate to true, even if they have different categories, as long as one's categories is a subset of the other's.
import polars as pl
dt1 = pl.Enum(["a"]) # CAUSES ERROR
# dt1 = pl.Enum(["a", "c"]) # DOES NOT CAUSE ERROR -- dt1 is not a subset of dt2
dt2 = pl.Enum(["a", "b"])
out = pl.LazyFrame().select(
pl.lit("a", dtype=dt1).alias("dt1"),
pl.lit("a", dtype=dt2).alias("dt2"),
).collect()
print(out.dtypes)
Second literal should have categories=['a', 'b']
.
[Enum(categories=['a']), Enum(categories=['a'])]
Installed versions
main
@c-peters
There seems to be bug related to equality check for the underlying categories. It is not exhaustive at the moment