mljar-supervised
mljar-supervised copied to clipboard
Add Ability to Convert Continuous Data to Categorical
Add Ability to Convert Continuous Data to Categorical : sklearn.preprocessing.KBinsDiscretizer
It can be added for sure. But for now, it is hard for me to tell you how this can be exposed to the user. Options that I can see:
- AutoML detects when feature after discretization help model achieves better performance and apply it
- The user provides the info on which features need to be discredited. This is will probably need a new method in the interface.
Any ideas how would you like to see it?
User info could be accepted. For example, if model = AutoML(), model.discretize(train,columns=[],bins=int,replace=False), where replace can replace continuous with categorical if wanted. It's also possible for this to be detected during the fit process and recommended to the user as an output. Also, I think that more categorical features would normally help because CatBoost looks for combinations of categorical variables to aid in boosting.