gbm.step() doesn't iterate for large continuous response variables
The iteration loop in gbm.step() doesn't ever start for some large-value continuous response variables.
library(dismo)
data(Anguilla_train)
Anguilla_train = Anguilla_train[1:200,]
fitcont = gbm.step(data = Anguilla_train, gbm.x = c(3:5, 7:14), gbm.y = 6, family = "gaussian",
tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)
#> GBM STEP - version 2.9
#>
#> Performing cross-validation optimisation of a boosted regression tree model
#> for DSDist and using a family of gaussian
#> Using 200 observations and 11 predictors
#> creating 10 initial models of 50 trees
#>
#> folds are unstratified
#> total mean deviance = 8013.378
#> tolerance is fixed at 8.0134
#> ntrees resid. dev.
#> 50 5300.797
#> now adding trees...
#> mean total deviance = 8013.378
#> mean residual deviance = 4755.488
#>
#> estimated cv deviance = 5300.796 ; se = 365.367
#>
#> training data correlation = 0.848
#> cv correlation = 0.707 ; se = 0.075
#>
#> elapsed time - 0.01 minutes
I poked around a bit in gbm.step() and I believe this is caused by the delta.deviance variable that is used as a condition in the while() loop that iterates through the number of trees by the step size. This variable has been hard-coded to be 1 before starting the loop, which works great for family = "bernoulli" and for smaller range continuous variables.
For some continuous variables with a large range, the while loop condition delta.deviance > tolerance.test can never be met when delta.deviance is 1 and the tolerance.test is mean.total.deviance * tolerance. In such cases, like the example above, the while loop never starts since its conditions are never met.
I tried changing the hard-coded delta.deviance from 1 to mean.total.deviance and things appeared to work fine for bernoulli and gaussian models. However, I don't know what other repercussions this has.
Another option to bypass this problem without changing the function is to make the tolerance really small for such variables so tolerance.test goes below 1 (but this may have other impacts) or to scale the response variable. If these are the best fixes, maybe add them as suggestions in the documentation?
Created on 2021-06-16 by the reprex package (v2.0.0)