eli5 icon indicating copy to clipboard operation
eli5 copied to clipboard

XGBoost: sum of weights does not match model prediction

Open noleto opened this issue 6 years ago • 7 comments

Hello everyone,

Local explanations with eli5.explain_prediction yields confusing results when applied to a XGBoost model. Indeed, the sum of contributions (computed from all leaves value) does not match the model prediction. This happens when base_score != 0 (which is the default for XGBRegressor and XGBClassifier).

Here goes the code to reproduce the issue: https://gist.github.com/noleto/987eb668e785a69e87ebf29f56fda55d (Jupyter Nootebook format)

So the question is: should ELI5 add the base_score to the local score (so that it is consistent with the model prediction) or just document better how to interpret the sum of weights?

Whatever the case, the behavior of the method as it is today can be misleading.

My 2 cents,

noleto avatar Aug 06 '18 15:08 noleto

This might be the same issue as #251

lopuhin avatar Mar 04 '19 14:03 lopuhin

@lopuhin what should i do here should i change the code so that it adds base_score into the eli5 score or make changes in the documentation

coderop2 avatar Mar 05 '19 16:03 coderop2

@coderop2 I think ideally we should make the score shown by eli5 equal to the model score, and also make it clear where does this come from (so show explicit contribution of base_score somewhere), so that the sum all all feature scores and base score is equal to the total score.

lopuhin avatar Mar 06 '19 07:03 lopuhin

or maybe we could add base_score to bias?

lopuhin avatar Mar 06 '19 07:03 lopuhin

many thanks guys for moving forward on this issue. From a tree-based model perspective, base_score can be seen as a kind of bias so it doesn't shock me to add both as the total "bias". However, for someone willing to decompose each part of the explanation it can be confusing as the real bias in a tree model represents the mean of the dataset (we may wonder why you don't have the same value here). So, +1 for showing explicit contribution of base_score (if any) at eli5.explain_prediction .

My 2cts,

noleto avatar Mar 06 '19 21:03 noleto

Thanks @noleto, making base_score explicit makes sense 👍

lopuhin avatar Mar 07 '19 07:03 lopuhin

What i propose is that we can include two rows in the HTML template where 1st row shows the base score and the 2nd shows the sum of base_score + eli5 score. So this way we are explicitly mentioning the base score of the estimator.

coderop2 avatar Mar 09 '19 13:03 coderop2