eli5
eli5 copied to clipboard
XGBoost: sum of weights does not match model prediction
Hello everyone,
Local explanations with eli5.explain_prediction
yields confusing results when applied to a XGBoost model.
Indeed, the sum of contributions (computed from all leaves value) does not match the model prediction. This happens when base_score
!= 0 (which is the default for XGBRegressor and XGBClassifier).
Here goes the code to reproduce the issue: https://gist.github.com/noleto/987eb668e785a69e87ebf29f56fda55d (Jupyter Nootebook format)
So the question is: should ELI5 add the base_score to the local score (so that it is consistent with the model prediction) or just document better how to interpret the sum of weights?
Whatever the case, the behavior of the method as it is today can be misleading.
My 2 cents,
This might be the same issue as #251
@lopuhin what should i do here should i change the code so that it adds base_score into the eli5 score or make changes in the documentation
@coderop2 I think ideally we should make the score shown by eli5 equal to the model score, and also make it clear where does this come from (so show explicit contribution of base_score somewhere), so that the sum all all feature scores and base score is equal to the total score.
or maybe we could add base_score to bias?
many thanks guys for moving forward on this issue. From a tree-based model perspective, base_score can be seen as a kind of bias so it doesn't shock me to add both as the total "bias". However, for someone willing to decompose each part of the explanation it can be confusing as the real bias in a tree model represents the mean of the dataset (we may wonder why you don't have the same value here).
So, +1 for showing explicit contribution of base_score (if any) at eli5.explain_prediction
.
My 2cts,
Thanks @noleto, making base_score explicit makes sense 👍
What i propose is that we can include two rows in the HTML template where 1st row shows the base score and the 2nd shows the sum of base_score + eli5 score. So this way we are explicitly mentioning the base score of the estimator.