iBreakDown icon indicating copy to clipboard operation
iBreakDown copied to clipboard

Aggregate local_interactions to estimate shap with interactions

Open aruaud opened this issue 4 years ago • 5 comments

Hi, Thanks for the package! I was wondering how is the variable order set when calculating the local interactions and if there could be a way to randomize that order to repeat the measure of the contribution for different orders (and get an estimation of the contribution closer to what SHAP would output)? I tried passing different orders of variables to local_interactions(..., order =) but it does not change anything, and so I don't know if I am missing a step.. ?

Script example:

# get the variable names and interactions
tmp <- colnames(X)
tmp <- combn(tmp, m = 2)
tmp <-unlist(lapply(asplit(tmp, MARGIN = 2), paste, collapse = ':'))
varN <- c(colnames(X), tmp)

# create different orders
var_orders <- list()
for (i in 1:5){
    set.seed(i)
    var_orders[[i]] <- sample(varN)
}

# get the contributions for different orders
res <- list()
i <- 1
for (vo in var_orders){
    res[[i]] <- local_interactions(new_observation = X[1,],x = explain_rf, interaction_preference = 10, var_orders = vo)
    i <- i+1
}

aruaud avatar Feb 01 '21 16:02 aruaud

Hi, I have a minimal example of the change in variable order:

library("DALEX")
library("iBreakDown")
set.seed(1313)
model_titanic_glm <- glm(survived ~ .,
                         data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed$survived,
                               label = "glm")

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=6:1)
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=1:6)
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=c('age:gender', 'class', 'embarked', 'fare', 'sibsp'))
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=c('age:gender', 'embarked:class', 'sibsp:fare'))
bd_glm

Estimation of SHAP by repeating contributions over different orders is possible using the shap function: https://modeloriented.github.io/iBreakDown/reference/break_down_uncertainty.html More on the topic of these methods can be found in the EMA e-book http://ema.drwhy.ai/shapley.html

hbaniecki avatar Feb 01 '21 19:02 hbaniecki

Thanks Hubert! I tried your example and it indeed works fine :) However, when passing an order with all variables and possible interactions, I do not get any interaction anymore but only the contributions of single variables. Is it that not all interactions can be passed to the function?

And thanks for pointing to the shap() function! I had been using it but could not find how to calculate SHAP values for interactions with it? This is why I switched to the local_interaction() function..

aruaud avatar Feb 02 '21 08:02 aruaud

I believe that each variable can be mentioned only once e.g. if 'age' is apparent, then 'age:gender' cannot be used. Additionally, I see that when passing interactions as strings, only one name convention is possible e.g. 'age:gender' not 'gender:age'.

As for SHAP with interactions, I think that it would be a great feature/method to consider.

hbaniecki avatar Feb 02 '21 10:02 hbaniecki

I see, thanks Hubert for the clarification! And so not all pairwise interactions can be assessed nor single and interactions.. That could also be a nice feature too :) Looking forward to the shap interactions!

aruaud avatar Feb 02 '21 10:02 aruaud

I think this could remain open

hbaniecki avatar Feb 02 '21 10:02 hbaniecki