mboost
mboost copied to clipboard
excessive runtime of coef.mboost
Profiling coef.mboost for a colleague, I have noticed that it seems to waste time to "check if base-learner has coefficients" here:
https://github.com/boost-R/mboost/blob/021e406f40018b23727f6b75c1f2297e82cc1c7c/R/mboost.R#L430
I don't know what the original intention was (@hofnerb), but given that cf is a numeric matrix, I think that condition will always be FALSE.
In the particular application, the base learner was a Markov random field over 327 regions. Here are the runtimes I get for evaluating the above condition as a function of mstop:
for (mstop in 10^(2:5)) {
cat(mstop, ": ")
cf <- matrix(0, 327, mstop) # the actual values are irrelevant for this runtime assessment
cat(system.time(any(sapply(cf, is.null)))["elapsed"], "s\n")
}
#> 100 : 0.009 s
#> 1000 : 0.109 s
#> 10000 : 1.094 s
#> 1e+05 : 20.04 s
Maybe that condition should be fixed to really "check if base-learner has coefficients". Otherwise the ret <- NULL case could simply be dropped. It would fail anyway when trying to set names on NULL later on.
The intention is that not all baselearners have coefficients. If I am remembering correctly the only BL without coefficients is btree, though:
library("mboost")
mod <- mboost(dist ~ btree(speed), data = cars)
coef(mod)
# $`btree(speed)`
# NULL
#
# attr(,"offset")
# [1] 42.98
I just checked your fix propsal. It breaks with the following error
Fehler in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
'x' muss numerisch sein
We hence cannot simply remove the check. I am wondering a bit, though, why we have the sapply bit. Are there lists of coefficients or does (is.null(cf)) not work for specific coefficient objects we might have?
Ok, I see. Maybe the check could be done earlier and more efficiently? Something like https://github.com/bastistician/mboost/commit/50a03928f6d884bbd3a73c2115b1f26a4d3f00ce ? Of course, being an mboost noob, I have no idea whether checking the first element for being NULL is sufficient to conclude that the base learner has no coefs.