mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

Add Ability to Convert Continuous Data to Categorical

Open eladmw opened this issue 5 years ago • 2 comments

Add Ability to Convert Continuous Data to Categorical : sklearn.preprocessing.KBinsDiscretizer

eladmw avatar Jul 10 '20 03:07 eladmw

It can be added for sure. But for now, it is hard for me to tell you how this can be exposed to the user. Options that I can see:

  • AutoML detects when feature after discretization help model achieves better performance and apply it
  • The user provides the info on which features need to be discredited. This is will probably need a new method in the interface.

Any ideas how would you like to see it?

pplonski avatar Jul 10 '20 04:07 pplonski

User info could be accepted. For example, if model = AutoML(), model.discretize(train,columns=[],bins=int,replace=False), where replace can replace continuous with categorical if wanted. It's also possible for this to be detected during the fit process and recommended to the user as an output. Also, I think that more categorical features would normally help because CatBoost looks for combinations of categorical variables to aid in boosting.

eladmw avatar Jul 11 '20 19:07 eladmw