machine-learning
machine-learning copied to clipboard
Preventing overfitting when evaluating many hyperparameters
In #18 I propose using a grid search to fit the classifier hyperparameters (notebook). We end up with average performance across cross-validation folds for many hyperparameter combinations. Here's the performance visualization from the notebook:
So the question is given a performance grid, how do we pick the optimal parameter combination? Picking just the highest performer can be a recipe for overfitting.
Here's a sklearn guide that doesn't answer my question but is still helpful. See also https://github.com/cognoma/machine-learning/issues/19#issuecomment-235927462 where overfitting has been mentioned. I'm paging @antoine-lizee, who has dealt with this issue in the past, and who can hopefully provide solutions from afar as he lives in the hexagon.