tpot List of score_func arguments for feature selection

Can I add a list for score_func, so that the optimal function is selected by tpot during the optimization?

     'sklearn.feature_selection.SelectKBest': {
            'k': range(3, 20),
            'score_func': [

                'sklearn.feature_selection.mutual_info_regression': None
               , 'sklearn.feature_selection.f_regression': None]

        },

Mar 25 '21 20:03 hanshupe

Hi @hanshupe, this is likely not possible because different scoring functions have fundamentally different meanings, and are used to tell different things about the performance of a model. E.g., F1-score is a composite score derived from both the precision and the recall, so how would you determine if an F1-score is "better" than a recall score? Also, how can you compare Cohen's kappa (which ranges between -1 and +1) with a different score that ranges from 0 to 1?

Mar 25 '21 21:03 JDRomano2

I don't understand why that should not work. TPOT is optimizing the pipeline globally based on the defined scoring function and cross validation settings etc.

So you can use the SelectKBest feature selection method using different functions, features are ranked and selected based on the k parameter. Finally tpot could select the SelectKBest parameters which maximise the tpot test score.

Mar 25 '21 21:03 hanshupe

Can you reopen the question? I think it would be an important feature if it is not possible yet.

Mar 25 '21 21:03 hanshupe

My apologies, I misunderstood your question. I mistakenly thought you were referring to the overall scoring function used to evaluate a pipeline rather than the scoring function used within the feature selector operator.

Mar 25 '21 21:03 JDRomano2

Just to clarify, does this work when you provide a custom configuration dictionary containing the sample configuration you gave in your initial question?

Mar 25 '21 21:03 JDRomano2

I use a custom configuration, which includes:

        'sklearn.feature_selection.SelectKBest': {
            'k': range(3, 20),
            'score_func': {
                'sklearn.feature_selection.mutual_info_regression': None
            }

I would like to include additional score_func in the optimization process. Not sure if I just don't know the correct syntax to specify a list of score_func or if it's not supported.

Btw. the same question is related to the Gaussian Regressor, where I would like to include multiple Kernels in the selection process:

    'sklearn.gaussian_process.GaussianProcessRegressor': {
        'kernel': {
            'sklearn.gaussian_process.kernels.DotProduct': { 
            }
        }
    }

Mar 25 '21 22:03 hanshupe

tpot tpot copied to clipboard

List of score_func arguments for feature selection

tpot
tpot copied to clipboard