mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

Failure to properly preprocess categorical data

Open williamty opened this issue 1 year ago • 1 comments

There're some categorical columns in my dataset which are stored by numbers. So I checked data_info.json file to see if they are preprocessed. Unfortunately, all of them are not recognized by mljar. Then I use the following code to convert these columns to categorical manually.

with open('enum.txt', 'r') as enum_file:
    categorical_columns = enum_file.read().splitlines()
for col in categorical_columns:
    df[col] = df[col].astype("category")

After doing this, I got an error:

ValueError: pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: target: category

It seems that mljar can't preprocess categorical data stored in numbers.

williamty avatar Sep 17 '23 12:09 williamty

It should handle category data type. Might be some bug.

pplonski avatar Sep 18 '23 07:09 pplonski