mljar-supervised
mljar-supervised copied to clipboard
How does MLJar detect whether a feature is categorical or continuous?
If I feed month values as 0,1,2,...,11 will MLJar detect this as continuous or categorical? Basically I don't want the model to come up with a threshold to separate the months into 2 groups. I want it to do checking like "if month=X, classify the sample as Y" that kind of thing.
Hi @off99555, MLJAR AutoML will check if a column is numeric or object (string). If it is numeric then will be treated as continuous. If column is type of object (string) then it will decide what kind of preprocessing should be applied.
The code that checks the types of the columns is here: https://github.com/mljar/mljar-supervised/blob/master/supervised/tuner/data_info.py
After AutoML training you can check how columns were preprocessed by checking the framework.json
file inside the model directory (each model can have different preprocessing).
OK. So it means that it will treat my feature as continuous unless I change the type to string. Thank you. Maybe this info should be included in the doc?
@off99555 good idea, I will add this to docs.
@pplonski How about explicit parameter to set categorical features?
Hi @strukevych, Here you have code for checking feature type https://github.com/mljar/mljar-supervised/blob/master/supervised/preprocessing/preprocessing_utils.py