tpot icon indicating copy to clipboard operation
tpot copied to clipboard

List of score_func arguments for feature selection

Open hanshupe opened this issue 3 years ago • 6 comments

Can I add a list for score_func, so that the optimal function is selected by tpot during the optimization?

     'sklearn.feature_selection.SelectKBest': {
            'k': range(3, 20),
            'score_func': [

                'sklearn.feature_selection.mutual_info_regression': None
               , 'sklearn.feature_selection.f_regression': None]

        },

hanshupe avatar Mar 25 '21 20:03 hanshupe

Hi @hanshupe, this is likely not possible because different scoring functions have fundamentally different meanings, and are used to tell different things about the performance of a model. E.g., F1-score is a composite score derived from both the precision and the recall, so how would you determine if an F1-score is "better" than a recall score? Also, how can you compare Cohen's kappa (which ranges between -1 and +1) with a different score that ranges from 0 to 1?

JDRomano2 avatar Mar 25 '21 21:03 JDRomano2

I don't understand why that should not work. TPOT is optimizing the pipeline globally based on the defined scoring function and cross validation settings etc.

So you can use the SelectKBest feature selection method using different functions, features are ranked and selected based on the k parameter. Finally tpot could select the SelectKBest parameters which maximise the tpot test score.

hanshupe avatar Mar 25 '21 21:03 hanshupe

Can you reopen the question? I think it would be an important feature if it is not possible yet.

hanshupe avatar Mar 25 '21 21:03 hanshupe

My apologies, I misunderstood your question. I mistakenly thought you were referring to the overall scoring function used to evaluate a pipeline rather than the scoring function used within the feature selector operator.

JDRomano2 avatar Mar 25 '21 21:03 JDRomano2

Just to clarify, does this work when you provide a custom configuration dictionary containing the sample configuration you gave in your initial question?

JDRomano2 avatar Mar 25 '21 21:03 JDRomano2

I use a custom configuration, which includes:

        'sklearn.feature_selection.SelectKBest': {
            'k': range(3, 20),
            'score_func': {
                'sklearn.feature_selection.mutual_info_regression': None
            }

I would like to include additional score_func in the optimization process. Not sure if I just don't know the correct syntax to specify a list of score_func or if it's not supported.

Btw. the same question is related to the Gaussian Regressor, where I would like to include multiple Kernels in the selection process:

    'sklearn.gaussian_process.GaussianProcessRegressor': {
        'kernel': {
            'sklearn.gaussian_process.kernels.DotProduct': { 
            }
        }
    }

hanshupe avatar Mar 25 '21 22:03 hanshupe