mboost
mboost copied to clipboard
function to summarize loss reductions achieved by each base-learner
Almond Stöcker und Tobi Kühn have started to write a function to extract the amount of risk reduction contributed by each base-learner from a fitted mboost model.
We'd like to see this included in mboost, with visualisation options, if possible, as it seems to answer a question that comes up frequently in postprocessing and interpreting boosting fits: which base-learners are the most important for the fit?
See code below.
# variable importance
RelRisk <- function( model, ohne1 = FALSE )
{
baselearner <- names(model$baselearner)
learner.type = sapply( strsplit(baselearner, "\\(") , "[[", 1)
learner.type.new = as.vector(sapply(learner.type, FUN = function(x)
switch(x, bols = "(linear)",
bbs = "(nonlinear)",
bspatial = "(spatial)")))
var.names = as.vector(variable.names(model))
baselearner.short = paste(var.names, learner.type.new)
n <- length(model$response)
# Which Baselearners were selected while boosting:
selected <- model$xselect()
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Emp. Risk:
# Initial Risk for the Intercept Model:
Risk0 <- with( model, family@risk( response, offset ) )
# Risk after the Boosting-Steps:
Risk <- model$risk()
# Risk loss per step:
RiskDif <- c(Risk0, Risk[-length(Risk)]) - Risk
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Falls gewuenscht, ohne den ersten Schritt:
if(ohne1)
{
RiskDif <- RiskDif[-1]
selected <- selected[-1]
}
# Explained Risk attributed to Baselearners
explained <- rep( 0, length(baselearner) )
for( i in 1:length(baselearner)) explained[i] <- sum( RiskDif[which(selected==i)] )
# Selection percentage of the baselearners
frequence <- rep(0, length(baselearner))
for( i in 1:length(baselearner)) frequence[i] <- mean( selected == i )
par(mar = c(5, 13, 4, 2) + 0.1 )
(b <- barplot(height = explained / n , names.arg = baselearner.short, las = 1, horiz = TRUE, main = paste("Endrisiko =", Risk[length(Risk)] / n) ) )
text(x = max(explained/n)/10, y = b, labels = frequence)
par(mar = c(5, 4, 4, 2) + 0.1 )
}
> cars.gb <- gamboost(dist ~ bols(speed) + bbs(speed, center=TRUE), data = cars,
+ control = boost_control(mstop = 50))
> RelRisk(cars.gb)

Any input on what such a function should or should not do would be highly appreciated!
Hi Fabian,
awesome!
Some notes/thoughts:
- This does not work for binary response because of the risk implementation in mboost (note that factors are encoded with (-1,1)).
Risk0will be 0 and a negative importance is set for the first base-learner. - Support of glmboost would be nice
- Decreasing order of the bars
- fixed number of digits for
frequence
Here is how it looks for my data:
RelRisk(zero)
RelRisk(zero, ohne1= FALSE)

and to compare my ggplot solution:
zero_importance.pdf