sl3 icon indicating copy to clipboard operation
sl3 copied to clipboard

Hyperparameter Tuning

Open kmishra9 opened this issue 5 years ago • 2 comments

I'm curious -- as I haven't seen built-in hyperparameter optimization functionality in sl3, is there a recommended way to go about doing that? Right now I'm essentially using caret to tune, then taking the best_tune of every model I fit and plopping those arguments into the appropriate make_learner call. Any plans to build this into sl3, or is the workflow I'm describing essentially the recommended move?

kmishra9 avatar Aug 07 '19 17:08 kmishra9

Great question (thanks for all your high-quality feedback lately). As I understand it, you're essentially using caret to do a discrete Super Learner over a grid of hyperparameter values for each learner, and then combining those discrete SLs in a continuous SL (although hopefully not nesting the cross-validation between those two).

We need to better support building out a set of learners over a grid of hyperparemeter values, and it's been on the todo list for far too long (see, e.g., https://github.com/tlverse/sl3/issues/2). For now, this is still a DIY thing unfortunately.

Having enumerated such a grid for each learner, I think you would be better off just "concatenating" the grids together to form your continuous SL library instead of doing the two stage SL. If you wanted to enforce sparsity in set of learners selected by SuperLearner, you could do so by adjusting your metalearner with an appropriate constraint. It's worth having a worked example of this, so i'll plan to add one

jeremyrcoyle avatar Aug 07 '19 19:08 jeremyrcoyle

Ah, so I'm actually using caret like this (which may provide a helpful basis for a worked example or vignette):

training_parameters = trainControl(
    method           = "cv",
    number           = 5,
    search           = "random",
    returnData       = TRUE,
    verboseIter      = TRUE,
    predictionBounds = c(0, 150),
    allowParallel    = TRUE
)

model_11 = train(
    x          = final_log_continuous_dataset_caret_boosting_train,
    y          = final_log_continuous_dataset_train %>% filter(num_claims > 9) %>% pull(eGFR),
    method     = "xgbTree",
    weights    = final_log_continuous_dataset_train %>% filter(num_claims > 9) %>% pull(num_eGFRs),
    trControl  = training_parameters,
    tuneLength = 70
)

So some distinctions: random search instead of grid search, and training using discrete models at a time, rather than the sl3 framework. I think a read a paper somewhere indicating that a tuneLength of > 60 will on average reach hyperparameters that are at most 5% from optimality, so 70 just for extra security. And then I do this with sl3:

train_task = make_sl3_Task(
    data         = final_log_continuous_dataset_train,
    covariates   = covariates,
    outcome      = outcome,
    outcome_type = "continuous",
    weights      = weights
)

[...]
lrnr_xgboost = make_learner(learner_class = Lrnr_xgboost, model_11$bestTune %>% unlist())

stack         = make_learner(Stack, lrnr_glm, lrnr_randomForest, lrnr_xgboost)
metalearner   = make_learner(Lrnr_nnls)
super_learner = Lrnr_sl$new(learners = stack,
                            metalearner = metalearner)

model_13 = super_learner$train(train_task)

Hope that's helpful! I also tuned the rf model with caret in the exact same way, omitted for brevity.

kmishra9 avatar Aug 07 '19 20:08 kmishra9