ParBayesianOptimization
ParBayesianOptimization copied to clipboard
Error in if (r == 0) stop("Results from FUN have 0 variance, cannot build GP.") :
Hi i got the above error when using the bayesOpt() function. I've checked that both the data and labels are valid.
here's the full ouput
Running initial scoring function 8 times in 4 thread(s)... 2130.505 seconds
Starting Epoch 1
1) Fitting Gaussian Process...
Error in if (r == 0) stop("Results from FUN have 0 variance, cannot build GP.") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In .Internal(gc(verbose, reset, full)) :
closing unused connection 7 (<-localhost:11016)
2: In .Internal(gc(verbose, reset, full)) :
closing unused connection 6 (<-localhost:11016)
3: In .Internal(gc(verbose, reset, full)) :
closing unused connection 5 (<-localhost:11016)
4: In .Internal(gc(verbose, reset, full)) :
closing unused connection 4 (<-localhost:11016)
and here's the code that i've run
library(ParBayesianOptimization)
library(xgboost)
library(doParallel)
nthread<-16
##data prep
load("label.rdata")
load("dat.rdata")
##optimizer
obj_func <- function(eta, max_depth, min_child_weight, subsample, lambda, alpha) {
xgb_train_dat<- xgb.DMatrix(data = train_feat_noNA, label = train_outcome_noNA)
param <- list(
# Hyper parameters
eta = eta,
max_depth = max_depth,
min_child_weight = min_child_weight,
subsample = subsample,
lambda = lambda,
alpha = alpha,
# Tree model
booster = "gbtree",
# Regression problem
objective = "reg:squarederror",
# Use the Mean Absolute Percentage Error
eval_metric = "mae")
xgbcv <- xgb.cv(params = param,
data = xgb_train_dat,
nround = 500,
nfold = 5,
early_stopping_rounds = 5,
verbose = 0,
maximize = F)
lst <- list(
# First argument must be named as "Score"
# Function finds maxima so inverting the output
Score = -min(xgbcv$evaluation_log$test_mape_mean),
# Get number of trees for the best performing model
nrounds = xgbcv$best_iteration
)
return(lst)
}
bounds <- list(eta = c(0.0001, 0.1),
max_depth = c(1L, 110L),
min_child_weight = c(1, 50),
subsample = c(0.1, 1),
lambda = c(0.001, 1000),
alpha = c(0.001, 1000))
set.seed(123)
#setting up parallel processing
cl <- makeCluster(nthread)
registerDoParallel(cl)
clusterExport(cl,c('train_feat_noNA','train_outcome_noNA'))
clusterEvalQ(cl,expr= {
library(xgboost)
})
bayes_out <- bayesOpt(FUN = obj_func, bounds = bounds,iters.n = nthread,iters.k=nthread,initPoints = length(bounds) + 2,parallel = T)
So i discovered that the error disappears if eval_metric = "mape"
instead of eval_metric = "mae"
This error means that all of the outputs from the tuning function have the same value. This causes singularity issues when trying to train a Gaussian Process.
I am running a binary classification problem and getting a similar issue. From my understanding the error is coming from the zeroOneScale function:
Scale a vector between 0-1
zeroOneScale <- function(vec) {
r <- max(vec) - min(vec)
# If the scoring function returned the same results
# this results in the function a vector of 1s.
if(r==0) stop("Results from FUN have 0 variance, cannot build GP.")
vec <- (vec - min(vec))/r
return(vec)
I think this is trying to scale my binary data which I do not want to happen.
Similar to OP this problem goes away when I use 'auc' as my evaluation metric but my dataset is highly skewed and I want to test with different metrics to see if this affects the tuning.
Heres my code for refernece
obj_func <- function(eta, max_depth, min_child_weight, subsample, lambda, alpha, nfolds) {
dtrain <- xgb.DMatrix(data = as.matrix(train_data[, -3]), label = train_data$hidden_hypoxemia, missing = NA)
param <- list(
eta = eta,
max_depth = max_depth,
min_child_weight = min_child_weight,
subsample = subsample,
lambda = lambda,
alpha = alpha,
# Tree model
booster = "gbtree",
objective = "binary:logistic",
eval_metric = "logloss"
)
xgbcv <- xgb.cv(params = param,
data = dtrain,
nround = 50,
nfold = nfolds,
prediction = TRUE,
early_stopping_rounds = 10,
verbose = 2,
maximize = TRUE,
stratified = TRUE
)
lst <- list(
# First argument must be named as "Score"
# Function finds maxima so inverting the output
Score = suppressWarnings(min(xgbcv$evaluation_log$test_auc_mean)),
# Get number of trees for the best performing model
nrounds = xgbcv$best_iteration
)
return(lst)
param_bounds <- list(eta = c(0.001, 0.15),
max_depth = c(1L, 10L),
min_child_weight = c(1, 50),
subsample = c(0.1, 1),
lambda = c(1, 10),
alpha = c(1,10),
nfold = c(3L, 10L))
}
bayes_out <- bayesOpt(FUN = obj_func, bounds = param_bounds, initPoints = length(param_bounds) + 2, iters.n = 3)
Any help with this would be greatly appreciated