xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

[Python] feature contributions from tweedie regression model.

Open Lejboelle opened this issue 2 years ago • 3 comments

I've trained a model using tweedie regression and would like to make predictions that output the different feature contributions. However, it seems like the contributions are not transformed:

import xgboost as xgb
import numpy as np

mdl = xgb.XGBRegressor(objective='reg:tweedie', tweedie_variance_power=1.0, random_state=0)
train_x = np.random.randint(0, 20, (5, 3))
train_y = np.random.random((5, 1))
mdl.fit(train_x, train_y)

test_data = np.random.randint(0, 20, (1, 3))
mdl.predict(test_data)  # outputs: 0.2877181

dmatrix_test = xgb.DMatrix(test_data)
mdl.get_booster().predict(dmatrix_test, pred_contribs=True)  # outputs:  [ 0.  ,  0.  , -0.3573199, -0.8884542]

If I run np.exp(np.sum(mdl.get_booster().predict(dmatrix_test, pred_contribs=True))) i get the same result as mdl.predict(), but I would like the individually transformed contributions.

Not sure if it is a bug or it is just not possible?

Thanks.

Environment: Python 3.9 Xgboost 1.7.6

Lejboelle avatar Nov 29 '23 13:11 Lejboelle

but I would like the individually transformed contributions.

Could you please elaborate on this? What do you mean by individually transformed?

trivialfis avatar Nov 29 '23 19:11 trivialfis

but I would like the individually transformed contributions.

Could you please elaborate on this? What do you mean by individually transformed?

So similarly to using other objective functions like psuedohubererror, I would like the sum of the contributions to be equal to 0.2877181 to see how each feature contributed to the predicted value. Edit: Some years ago there was ongoing work on implementing this in SHAP: https://github.com/shap/shap/pull/1041, however, it seems like it never got merged.

Lejboelle avatar Nov 30 '23 07:11 Lejboelle

TreeSHAP in XGBoost is calculated on "raw" scale, which is the link scale. Like Poisson and Gamma, Tweedie objective uses the log link, so the SHAP values will sum up to the log prediction minus the baseline.

mayer79 avatar Dec 10 '23 09:12 mayer79