tpot
tpot copied to clipboard
Adding preprocessing to config dict
I am trying to use tpot to select among gaussian process regressors. And, I want to include minmaxscalar in the proprocessing in tpot. Here is my dictionary:
gaussian_config = {
'sklearn.gaussian_process.GaussianProcessRegressor': {
'kernel': [1.0**2*Matern(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0),nu=1.5)+WhiteKernel(0.1),
1.0*RBF(length_scale=0.5, length_scale_bounds=(1e-05, 100000.0))+WhiteKernel(0.1),
1.0*RationalQuadratic(length_scale=0.5, alpha=0.1)+WhiteKernel(0.1),
1.0*ExpSineSquared(length_scale=0.5, periodicity=3.0, length_scale_bounds=(1e-05, 100000.0), periodicity_bounds=(1.0, 10.0))+WhiteKernel(0.1),
ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0)) ** 2)+WhiteKernel(0.1),
],
'alpha': [5e-9, 1e-3, 1e-2, 1e-1, 1., 10., 100.],
'normalize_y': [True, False],
'optimizer': ['fmin_l_bfgs_b']
},
'sklearn.preprocessing.MinMaxScaler': {
}
}
kernel_dict = {
"RBF": RBF,
"RationalQuadratic": RationalQuadratic,
"ExpSineSquared": ExpSineSquared,
"ConstantKernel": ConstantKernel,
"DotProduct": DotProduct,
"WhiteKernel": WhiteKernel,
"Matern": Matern,
}
The workflow is:
tpot_obj = TPOTRegressor(generations=50,
population_size=100,
verbosity=3,
cv=5,
config_dict=gaussian_config,
template='Regressor',
scoring='r2',
random_state=42)
#tpot_obj = TPOTRegressor(template='Regressor', verbosity=2)
tpot_obj._fit_init()
tpot_obj.operators_context.update(kernel_dict)
tpot_obj.warm_start = True
tpot=tpot_obj.fit(X_train, y_train)
tpot.fitted_pipeline_
I can't find any reference to minmaxscalar in the fitted pipeline or tpot.evaluated_individuals_ So, I must be doing something incorrectly. Any advice will be greatly appreciated.
Hi @wayneking517, thanks for your issue report.
Currently, from what I can see, you have implemented the preprocessor into the configuration dictionary correctly. However, when instantiating the TPOT object, you have passed the argument:
template='Regressor'
This template means that TPOT will only include regressors in the constructed and evaluated pipelines.
If you want to include preprocessors/transformers, you should either remove the template argument (so that TPOT will try to use all options in the configuration dictionary and not just the regressors, which in this case is only the GaussianProcessRegressor) or change this argument to:
template='Transformer-Regressor'
Hope this helps.