tpot
tpot copied to clipboard
List of score_func arguments for feature selection
Can I add a list for score_func, so that the optimal function is selected by tpot during the optimization?
'sklearn.feature_selection.SelectKBest': {
'k': range(3, 20),
'score_func': [
'sklearn.feature_selection.mutual_info_regression': None
, 'sklearn.feature_selection.f_regression': None]
},
Hi @hanshupe, this is likely not possible because different scoring functions have fundamentally different meanings, and are used to tell different things about the performance of a model. E.g., F1-score is a composite score derived from both the precision and the recall, so how would you determine if an F1-score is "better" than a recall score? Also, how can you compare Cohen's kappa (which ranges between -1 and +1) with a different score that ranges from 0 to 1?
I don't understand why that should not work. TPOT is optimizing the pipeline globally based on the defined scoring function and cross validation settings etc.
So you can use the SelectKBest feature selection method using different functions, features are ranked and selected based on the k parameter. Finally tpot could select the SelectKBest parameters which maximise the tpot test score.
Can you reopen the question? I think it would be an important feature if it is not possible yet.
My apologies, I misunderstood your question. I mistakenly thought you were referring to the overall scoring function used to evaluate a pipeline rather than the scoring function used within the feature selector operator.
Just to clarify, does this work when you provide a custom configuration dictionary containing the sample configuration you gave in your initial question?
I use a custom configuration, which includes:
'sklearn.feature_selection.SelectKBest': {
'k': range(3, 20),
'score_func': {
'sklearn.feature_selection.mutual_info_regression': None
}
I would like to include additional score_func in the optimization process. Not sure if I just don't know the correct syntax to specify a list of score_func or if it's not supported.
Btw. the same question is related to the Gaussian Regressor, where I would like to include multiple Kernels in the selection process:
'sklearn.gaussian_process.GaussianProcessRegressor': {
'kernel': {
'sklearn.gaussian_process.kernels.DotProduct': {
}
}
}