NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Expose Ensembling from ML.NET

Open ganik opened this issue 5 years ago • 1 comments

what title says

ganik avatar Jul 20 '19 00:07 ganik

PR #207 exposed EnsembleClassifier (multiclass) and EnsembleRegressor, along with components needed to sample subsets of the data to train each model in the ensemble on, and components to select a subset of the trained model and combine their output to form the ensemble.

Remaining work:

  1. Expose BasePredictor in ML.NET and then in NimbusML so that NimbusML users can specify one or more learner of their choice to use in ensemble, instead of using the default LogisticRegressionClassifier for EnsembleClassifier and OnlineGradientDescentRegressor for EnsembleRegressor.

This would entail rewriting EnsembleTrainerBase and its derived classes in ML.NET as IEstimator instead of ITrainer, and also writing an ITransformer for them that would be produced by fitting the estimator.

  1. Expose EnsembleBinaryClassifier in NimbusML. Curerntly, binary classification can be done with EnsembleClassifier but it would be useful to have a specific binary classifier so that users are not restricted to using the multiclass classifiers in NimbusML for binary classification, once BasePredictor is exposed.

The reason for not exposing the binary classifier is that NimbusML adds a LabelColumnKeyBooleanConverter to a Pipeline, which converts the label to Key, not Boolean. As EnsembleTrainer is currently implemented in ML.NET (i.e. as ITrainer), the label goes through type checks, which require it to be Boolean. When implemented as IEstimator in (1), the label would go through a different series of checks, which aloow it to be Boolean or Key with two Key counts.

najeeb-kazmi avatar Aug 07 '19 19:08 najeeb-kazmi