mljar-supervised
mljar-supervised copied to clipboard
Failure to properly preprocess categorical data
There're some categorical columns in my dataset which are stored by numbers. So I checked data_info.json file to see if they are preprocessed. Unfortunately, all of them are not recognized by mljar. Then I use the following code to convert these columns to categorical manually.
with open('enum.txt', 'r') as enum_file:
categorical_columns = enum_file.read().splitlines()
for col in categorical_columns:
df[col] = df[col].astype("category")
After doing this, I got an error:
ValueError: pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: target: category
It seems that mljar can't preprocess categorical data stored in numbers.
It should handle category data type. Might be some bug.