polars icon indicating copy to clipboard operation
polars copied to clipboard

fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns

Open collinprince opened this issue 1 year ago • 4 comments

Update extract to create an empty pl.Enum so that column expressions can be extracted for the pl.Enum datatype e.g. pl.col(pl.Enum).

Also update the Enum constructor to allow/default to None for the categories param. This mirrors the logic that is used in extract for pl.Enum and operates as a convenient short-hand for the current supported logic of passing in an empty series.

Fixes #13269

collinprince avatar Jan 21 '24 19:01 collinprince

https://github.com/pola-rs/polars/blob/f93e4505157905ea159054ce9a8e2cf091acb823/crates/polars-core/src/datatypes/dtype.rs#L76

The problem of equality check is here. We need to distinguish Enum from Categorical. Right now, if you do df.select(pl.col(Enum)) or df.select(pl.col(Categorical) you get both categorical and enum columns. We need to alter the equality check on the datatype.

c-peters avatar Jan 24 '24 09:01 c-peters

@c-peters @ritchie46 Updated the code to handle equality of enum vs categorical though it feels a bit awkward due to needing to support that all other revmap comparisons besides those containing enums need to be treated as true

                #[cfg(feature = "dtype-categorical")]
                (Categorical(rev_l, _), Categorical(rev_r, _)) => {
                    let is_l_enum = rev_l.as_ref().map_or(false, |x| x.is_enum());
                    let is_r_enum = rev_r.as_ref().map_or(false, |x| x.is_enum());
                    is_l_enum == is_r_enum
                },

collinprince avatar Jan 24 '24 14:01 collinprince

Yes, this is not ideal. I'm working on making Enums an acual datatype as to avoid this cumbersome rev_map check

c-peters avatar Jan 26 '24 09:01 c-peters

@collinprince , Enum is a now an actual data type, could you resolve the merge conflicts?

c-peters avatar Jan 26 '24 14:01 c-peters

should be good now @c-peters

collinprince avatar Jan 28 '24 02:01 collinprince

This is supeseded by #14628. We do not allow empty Enum, because the categories should be present when defining the datatype. You can select the columns with the class itself

c-peters avatar Feb 21 '24 12:02 c-peters