polars
polars copied to clipboard
.from_pandas() fails when pandas df contains a pd.Categorical.
What language are you using? Python
What version of polars are you using? '0.13.58' (latest on conda-forge)
What language version are you using? Python 3.10.4
For example:
import pandas as pd
import polars as pl
pd_df = pd.DataFrame({"label": pd.Categorical([1,1,1,2,2]),
"measurement": [1,2,3,4,5]})
pd_df
pl.from_pandas(pd_df)
fails with the following error:
ComputeError: polars only support dictionaries with string like values
It actually only fails when the Categorical contains numeric values.
Polars' categorical type only maps to string categories. What do you think we should have done here? Should we cast to integers?
The error seems quite informative to me.
Polars' categorical type only maps to string categories. What do you think we should have done here? Should we cast to integers?
The error seems quite informative to me.
Not sure if just casting to integers is a good idea since pandas supports categories being ordered. OP's data might be a nominal representation
I guess it is fine the way it is. I found this in an older version of polars where the error was less clear. With the latest version and the new error it is much clearer. IMO one could cast it to integers. But I am definitely fine with leaving it as it is and closing this issue.
Polars' categorical type only maps to string categories.
It would help me a lot to know, what I can do as a user. I read the error message and know what went wrong. But I do not know what to do.
It would be super nice, if polars would support (importing) all of pandas.CategoricalDtype / pandas.Categorical and pyarrow.DictionaryArray.