tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Inclusion of more regressors and classifiers in the Model Selection

Open ankitrajixr opened this issue 3 years ago • 2 comments

Hi team,

Thank you for such a helpful library. While using tpot library, we found certain regressors and classifiers not included in the model selection of the machine learning pipeline. It would be great with the addition of some regressors like Gaussian Process Regressor, Voting Regressor and classifiers like Voting Classifier, AdaBoost Classifier.

How to recreate it?

  1. User creates TPOT instance
  2. User calls TPOT fit() function with training data

Expected result

The above-mentioned regressor and classifier are not included in the Machine learning for Model selection.

ankitrajixr avatar Mar 07 '21 22:03 ankitrajixr

Hi @ankitrajixr, these may have previously been found to not play well with other parts of TPOT, and that may be why they are not included by default.

However, you can add any scikit-learn classifier or regressor to TPOT by simply including it in a custom configuration dictionary. Please see: https://epistasislab.github.io/tpot/using/#customizing-tpots-operators-and-parameters

If you can use them and they perform well, we can look into adding them to the built-in configuration dictionaries. I'd recommend giving it a try and letting us know (on this thread) how they perform.

JDRomano2 avatar Mar 15 '21 01:03 JDRomano2

Thank you for your response @JDRomano2 . I have tried the custom TPOT config dictionary. Below is the code snippet for it.

tpot_config = {
    'kernel' : [1.0*RBF(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0)),
           1.0*RationalQuadratic(length_scale=0.5, alpha=0.1),
           1.0*ExpSineSquared(length_scale=0.5, periodicity=3.0,
                                length_scale_bounds=(1e-05, 100000.0),
                                periodicity_bounds=(1.0, 10.0)),
           ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0)) ** 2),
           1.0**2*Matern(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0),
                        nu=0.5)],
        'alpha': [5e-9,1e-3, 1e-2, 1e-1, 1., 10., 100.],
        'normalize_y' : [True, False],
        'optimizer' : ['fmin_l_bfgs_b']
}

The above code works fine for smaller datasets.

ankitrajixr avatar Mar 17 '21 19:03 ankitrajixr