BestPractices icon indicating copy to clipboard operation
BestPractices copied to clipboard

include hyperparameter tuning

Open sp8rks opened this issue 3 years ago • 1 comments

add a section on hyperparameter tuning since classical models were used with default hyperparameters

sp8rks avatar Jan 20 '22 03:01 sp8rks

Suggestion for some of the commentary in the markdown cell about hyperparameter optimization. Feel free to edit as needed.

  • If evaluations are very inexpensive (i.e. millions of evaluations), go with grid-based, random, or SOBOL points via e.g. sklearn.model_selection.GridSearchCV, sklearn.model_selection.RandomizedSearchCV, or skopt.sampler.Sobol, respectively. Grid-based may be good enough, but random is generally better than grid-based, and SOBOL is generally better than random. To integrate SOBOL with a CV search, see e.g. sklearn.model_selection.cross_validate
  • If evaluations are moderately inexpensive (i.e. tens of thousands of evaluations), go with a genetic algorithm via e.g. sklearn-genetic-opt or TPOT.
  • If evaluations are very expensive (i.e. hundreds of evaluations), go with Bayesian optimization via e.g. skopt.BayesSearchCV or Ax. BayesSearchCV is a more lightweight model and requires models to be optimized that match the scikit-learn estimator API. Ax has much more sophisticated Bayesian models, including automatic relevance determination (ARD) and corresponding feature importances, advanced handling of noise, and capabilities to handle high-dimensional datasets. It also has several interfaces ranging from easy-to-use to heavily customizable and is a tool that we recommend.
  • There may be other reasons in addition to the expense of model evaluation that can guide the choice of hyperparameter optimization scheme such as interpretability and ease of use.
  • In our case, due to [inexpensive/moderately expensive/expensive] model evaluations for sklearn models and to maintain a lightweight environment, we choose to use [GridSearchCV/sklearn-genetic-opt/skopt.BayesSearchCV; however, other options could have been used instead.

sgbaird avatar Jan 27 '22 22:01 sgbaird