mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

How does MLJar detect whether a feature is categorical or continuous?

Open offchan42 opened this issue 2 years ago • 5 comments

If I feed month values as 0,1,2,...,11 will MLJar detect this as continuous or categorical? Basically I don't want the model to come up with a threshold to separate the months into 2 groups. I want it to do checking like "if month=X, classify the sample as Y" that kind of thing.

offchan42 avatar Dec 25 '21 19:12 offchan42

Hi @off99555, MLJAR AutoML will check if a column is numeric or object (string). If it is numeric then will be treated as continuous. If column is type of object (string) then it will decide what kind of preprocessing should be applied.

The code that checks the types of the columns is here: https://github.com/mljar/mljar-supervised/blob/master/supervised/tuner/data_info.py

After AutoML training you can check how columns were preprocessed by checking the framework.json file inside the model directory (each model can have different preprocessing).

pplonski avatar Dec 30 '21 09:12 pplonski

OK. So it means that it will treat my feature as continuous unless I change the type to string. Thank you. Maybe this info should be included in the doc?

offchan42 avatar Dec 30 '21 09:12 offchan42

@off99555 good idea, I will add this to docs.

pplonski avatar Dec 30 '21 09:12 pplonski

@pplonski How about explicit parameter to set categorical features?

strukevych avatar Nov 08 '23 19:11 strukevych

Hi @strukevych, Here you have code for checking feature type https://github.com/mljar/mljar-supervised/blob/master/supervised/preprocessing/preprocessing_utils.py

pplonski avatar Nov 09 '23 16:11 pplonski