tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Adding preprocessing to config dict

Open wayneking517 opened this issue 3 years ago • 1 comments

I am trying to use tpot to select among gaussian process regressors. And, I want to include minmaxscalar in the proprocessing in tpot. Here is my dictionary:

gaussian_config = {
    'sklearn.gaussian_process.GaussianProcessRegressor': {
          'kernel': [1.0**2*Matern(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0),nu=1.5)+WhiteKernel(0.1),
                          1.0*RBF(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0))+WhiteKernel(0.1),
                          1.0*RationalQuadratic(length_scale=0.5, alpha=0.1)+WhiteKernel(0.1),
                          1.0*ExpSineSquared(length_scale=0.5, periodicity=3.0, length_scale_bounds=(1e-05, 100000.0), periodicity_bounds=(1.0, 10.0))+WhiteKernel(0.1),
                          ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0)) ** 2)+WhiteKernel(0.1),
                          ],
          'alpha': [5e-9, 1e-3, 1e-2, 1e-1, 1., 10., 100.],
          'normalize_y': [True, False],
          'optimizer': ['fmin_l_bfgs_b']
    },
    'sklearn.preprocessing.MinMaxScaler': {
    }

}
kernel_dict = {
	"RBF": RBF,
    "RationalQuadratic": RationalQuadratic,
    "ExpSineSquared": ExpSineSquared,
    "ConstantKernel": ConstantKernel,
    "DotProduct": DotProduct,
    "WhiteKernel": WhiteKernel,
    "Matern": Matern,
}

The workflow is:

tpot_obj = TPOTRegressor(generations=50,
	population_size=100,
	verbosity=3,
	cv=5,
	config_dict=gaussian_config,
    template='Regressor',
    scoring='r2',
	random_state=42)

#tpot_obj = TPOTRegressor(template='Regressor', verbosity=2)



tpot_obj._fit_init()
tpot_obj.operators_context.update(kernel_dict)
tpot_obj.warm_start = True

tpot=tpot_obj.fit(X_train, y_train)
tpot.fitted_pipeline_

I can't find any reference to minmaxscalar in the fitted pipeline or tpot.evaluated_individuals_ So, I must be doing something incorrectly. Any advice will be greatly appreciated.

wayneking517 avatar Oct 28 '21 14:10 wayneking517

Hi @wayneking517, thanks for your issue report.

Currently, from what I can see, you have implemented the preprocessor into the configuration dictionary correctly. However, when instantiating the TPOT object, you have passed the argument: template='Regressor'

This template means that TPOT will only include regressors in the constructed and evaluated pipelines.

If you want to include preprocessors/transformers, you should either remove the template argument (so that TPOT will try to use all options in the configuration dictionary and not just the regressors, which in this case is only the GaussianProcessRegressor) or change this argument to: template='Transformer-Regressor'

Hope this helps.

rachitk avatar Nov 01 '21 21:11 rachitk