sl3
sl3 copied to clipboard
Missing learner (no weight = 0)
Hey there,
I fit a model with the following code:
# Define Super Learner
stack = make_learner(
Stack,
lrnr_glm,
lrnr_randomForest,
lrnr_xgboost,
lrnr_xgboost_limited,
lrnr_svm,
lrnr_earth
)
metalearner = make_learner(Lrnr_nnls)
super_learner = Lrnr_sl$new(learners = stack, metalearner = metalearner)
# Sequential model training
model_13 = super_learner$train(train_task)
Everything looks good except the Lrnr_earth_2_3_backward_0_1_0_0
learner, which is specified in the picture below, doesn't exist as part of the weights. It doesn't even have a weight of 0, like lrnr_glm
🤷♂, and yet training works when I use lrnr_earth$train()
manually
I would guess it fails on one of the folds, hence it is removed from the list of learners used when fitting the final weights.
That makes sense -- it takes a while to run so I've been ensuring lrnrs run on samples. Any ideas on how I could parse out which fold or piece of data its failing on?
Hmm, so I'm not sure its the case that lrnr_earth
is simply failing. I basically split up the dataset into chunks of 100 pieces and trained lrnr_earth
on each one, then verified each had been trained and that there were no errors:
results = foreach(i = seq(from = 1, to = nrow(final_log_continuous_dataset_train), by=100)) %do% {
task = make_sl3_Task(
data = final_log_continuous_dataset_train %>% dplyr::slice(i:(i+99)),
covariates = covariates,
outcome = outcome,
outcome_type = "continuous",
weights = weights
)
return(lrnr_earth$train(task))
}
for (i in results) {i$assert_trained()}
In addition, when I ran the same super_learner on a sample_task
instead of the training_task
from above, I get the same result as above (6 input learners, 5 learners with weights).
Finally, I ran the following code:
small_stack = make_learner(
Stack,
lrnr_earth
)
small_super_learner = Lrnr_sl$new(learners = small_stack, metalearner = metalearner)
scheduled_small_super_learner = Scheduler$new(
delayed_object = delayed_learner_train(learner = small_super_learner, task = train_task),
job_type = FutureJob,
nworkers = cpus_logical,
verbose = TRUE
)
scheduled_small_super_learner$compute()
This errors with the following:
... [below repeated a bunch of times]
Error in private$.train(subsetted_task, trained_sublearners) :
All learners in stack have failed
In addition: Warning message:
In private$.train(subsetted_task, trained_sublearners) :
Lrnr_earth_2_3_backward_0_1_0_0 failed with message: no function 'earth' could be found. It will be removed from the stack
updating Stack from ready to running
run:11 ready:0 workers:60
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
updating Stack from running to resolved
Failed on Stack
Error in self$compute_step() :
Error in private$.train(subsetted_task, trained_sublearners) :
All learners in stack have failed
That foreach
loop that worked above? Doesn't work now:
Failed on Lrnr_earth_2_3_backward_0_1_0_0
... [repeat a bunch of times] ...
Failed on Lrnr_earth_2_3_backward_0_1_0_0
Error in { : task 1 failed - "no function 'earth' could be found"
However, this seems to show the output I'd expect:
getS3method("earth", "default")
Just few short comments: for manual checking on which fold it fails, you would want to grab the folds generated by sl3
with corresponding samples in each for testing purposes.
Just from glancing at the error, it looks like earth
is not loaded/installed. Can you provide your sessioninfo
?