superlearner-guide icon indicating copy to clipboard operation
superlearner-guide copied to clipboard

SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc.

SuperLearner Guide

A guide to using SuperLearner for prediction. This is now included as a vignette in the SuperLearner package.

Note: this tutorial is a bit out of date; some supplemental methods are now in my ck37r package.

SuperLearner Intro

  • Installing
  • Background
  • Create dataset
  • Review available models
  • Fit single models
  • Fit ensemble
  • Predict on new dataset
  • Customize a model setting
  • External cross-validation
  • Test multiple hyperparameter settings
  • Parallelize across CPUs
  • Distribution of ensemble weights
  • Feature selection (screening)
  • Optimize for AUC
  • XGBoost hyperparameter exploration

Intermediate

(To be created)

  • create.Learner() custom environments
  • SL.caret wrapper
  • Custom learner wrapper
  • Custom screener
  • Library analysis - cumulative
  • Library analysis - individual algorithms
  • Recombine SuperLearner

Advanced

(To be created)

  • Parallelize across computers (SLURM)
  • Repeated cross-validation
  • Data-adaptive V-selection for cross-validation
  • Multi-level meta-learning

Resources

Books:

Campus Groups:

Courses at Berkeley:

  • Stat 154 - Statistical Learning
  • CS 189 / CS 289A - Machine Learning
  • PH 252D - Causal Inference
  • PH 295 - Big Data
  • PH 295 - Targeted Learning for Biomedical Big Data
  • INFO - TBD

Also many Coursera offerings and other online classes.

References

Erin LeDell, Maya L. Petersen & Mark J. van der Laan, "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates." (Electronic Journal of Statistics)

Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. http://biostats.bepress.com/ucbbiostat/paper266/

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.