polars icon indicating copy to clipboard operation
polars copied to clipboard

Enum literals evaluate to equal for CSE when one dtype is subset of another

Open mcrumiller opened this issue 11 months ago • 2 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

If two enums are used in a collect with CSE enabled, then their literals will evaluate to true, even if they have different categories, as long as one's categories is a subset of the other's.

import polars as pl

dt1 = pl.Enum(["a"])         # CAUSES ERROR
# dt1 = pl.Enum(["a", "c"])  # DOES NOT CAUSE ERROR -- dt1 is not a subset of dt2
dt2 = pl.Enum(["a", "b"])

out = pl.LazyFrame().select(
    pl.lit("a", dtype=dt1).alias("dt1"),
    pl.lit("a", dtype=dt2).alias("dt2"),
).collect()

print(out.dtypes)

Second literal should have categories=['a', 'b'].

[Enum(categories=['a']), Enum(categories=['a'])]

Installed versions

main

mcrumiller avatar Mar 19 '24 19:03 mcrumiller

@c-peters

mcrumiller avatar Mar 19 '24 19:03 mcrumiller

There seems to be bug related to equality check for the underlying categories. It is not exhaustive at the moment

c-peters avatar Mar 23 '24 18:03 c-peters