mlrMBO
mlrMBO copied to clipboard
Imputation still not stopping errors
Error in stats::optim(foo.vals, fn = fn1, gr = gr1, method = "L-BFGS-B", :
non-finite value supplied by optim
Hard to post a reproducible example on here because the data file I'm using is large and proprietary, but I've gotten this error (which crashes and stops optimization) frequently despite the fact that I have two different imputation failsafes active, as below:
impute_fun <- function(x, y, opt.path, ...)
{
task <- makeRegrTask(data = opt.path$env$path, target = opt.path$y.names)
lrn <- makeLearner("regr.randomForest", ntree = 500)
mdl <- train(lrn, task)
pred <- predict(mdl, newdata = x)
return(pred$data$response)
}
mc <- makeMBOControl(impute.y.fun = impute_fun )
tc <- makeTuneControlMBO(impute.val = 0.5, budget = 75L, mbo.control = mc)
Why is the optimization process crashing rather than using one of the available imputation functions? I'm using a tuning wrapper around four different XGBoost learners, running benchmark() on a single task, and parallelizing with parallelStartSocket (Windows) with level set to "mlr.benchmark".
BTW, since I'm doing this as a parallelized benchmarking exercise, so I keep losing everything when this fails (I can get data from the logs, but not a scalable solution). Not sure how to use save.on.disk.at in that context.
The component that is failing is the optimization of the infill criterion. At the moment it's hard for me to reproduce or yet think of a way to produce this error.
Also impute.y.fun should not have any effect here since crashing of the learner is handled by mlr through the impute.val. Is there any special motivation to choose 0.5 rather than the default?
However, you can use save.on.disk.at = 1:75 so that the result will be stored to disk under save.file.path = "mbo_run.RData" after each MBO iteration. Then you can investigate there.
I ran into this in the past as well and what helped for me was setting on.surrogate.error = "warn". This proposes a random point when the surrogate learner errors but keeps the process alive.