sl3
sl3 copied to clipboard
Hyperparameter Tuning
I'm curious -- as I haven't seen built-in hyperparameter optimization functionality in sl3
, is there a recommended way to go about doing that? Right now I'm essentially using caret
to tune, then taking the best_tune
of every model I fit and plopping those arguments into the appropriate make_learner
call. Any plans to build this into sl3
, or is the workflow I'm describing essentially the recommended move?
Great question (thanks for all your high-quality feedback lately). As I understand it, you're essentially using caret
to do a discrete Super Learner over a grid of hyperparameter values for each learner, and then combining those discrete SLs in a continuous SL (although hopefully not nesting the cross-validation between those two).
We need to better support building out a set of learners over a grid of hyperparemeter values, and it's been on the todo list for far too long (see, e.g., https://github.com/tlverse/sl3/issues/2). For now, this is still a DIY thing unfortunately.
Having enumerated such a grid for each learner, I think you would be better off just "concatenating" the grids together to form your continuous SL library instead of doing the two stage SL. If you wanted to enforce sparsity in set of learners selected by SuperLearner, you could do so by adjusting your metalearner with an appropriate constraint. It's worth having a worked example of this, so i'll plan to add one
Ah, so I'm actually using caret
like this (which may provide a helpful basis for a worked example or vignette):
training_parameters = trainControl(
method = "cv",
number = 5,
search = "random",
returnData = TRUE,
verboseIter = TRUE,
predictionBounds = c(0, 150),
allowParallel = TRUE
)
model_11 = train(
x = final_log_continuous_dataset_caret_boosting_train,
y = final_log_continuous_dataset_train %>% filter(num_claims > 9) %>% pull(eGFR),
method = "xgbTree",
weights = final_log_continuous_dataset_train %>% filter(num_claims > 9) %>% pull(num_eGFRs),
trControl = training_parameters,
tuneLength = 70
)
So some distinctions: random search instead of grid search, and training using discrete models at a time, rather than the sl3 framework. I think a read a paper somewhere indicating that a tuneLength
of > 60 will on average reach hyperparameters that are at most 5% from optimality, so 70 just for extra security. And then I do this with sl3
:
train_task = make_sl3_Task(
data = final_log_continuous_dataset_train,
covariates = covariates,
outcome = outcome,
outcome_type = "continuous",
weights = weights
)
[...]
lrnr_xgboost = make_learner(learner_class = Lrnr_xgboost, model_11$bestTune %>% unlist())
stack = make_learner(Stack, lrnr_glm, lrnr_randomForest, lrnr_xgboost)
metalearner = make_learner(Lrnr_nnls)
super_learner = Lrnr_sl$new(learners = stack,
metalearner = metalearner)
model_13 = super_learner$train(train_task)
Hope that's helpful! I also tuned the rf
model with caret
in the exact same way, omitted for brevity.