polars
polars copied to clipboard
fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns
Update extract
to create an empty pl.Enum
so that column expressions can be extracted for the pl.Enum
datatype e.g. pl.col(pl.Enum)
.
Also update the Enum
constructor to allow/default to None for the categories
param. This mirrors the logic that is used in extract
for pl.Enum
and operates as a convenient short-hand for the current supported logic of passing in an empty series.
Fixes #13269
https://github.com/pola-rs/polars/blob/f93e4505157905ea159054ce9a8e2cf091acb823/crates/polars-core/src/datatypes/dtype.rs#L76
The problem of equality check is here. We need to distinguish Enum from Categorical. Right now, if you do df.select(pl.col(Enum))
or df.select(pl.col(Categorical)
you get both categorical and enum columns. We need to alter the equality check on the datatype.
@c-peters @ritchie46
Updated the code to handle equality of enum vs categorical though it feels a bit awkward due to needing to support that all other revmap comparisons besides those containing enums need to be treated as true
#[cfg(feature = "dtype-categorical")]
(Categorical(rev_l, _), Categorical(rev_r, _)) => {
let is_l_enum = rev_l.as_ref().map_or(false, |x| x.is_enum());
let is_r_enum = rev_r.as_ref().map_or(false, |x| x.is_enum());
is_l_enum == is_r_enum
},
Yes, this is not ideal. I'm working on making Enums an acual datatype as to avoid this cumbersome rev_map check
@collinprince , Enum
is a now an actual data type, could you resolve the merge conflicts?
should be good now @c-peters
This is supeseded by #14628. We do not allow empty Enum
, because the categories should be present when defining the datatype. You can select the columns with the class itself