smile icon indicating copy to clipboard operation
smile copied to clipboard

TreeSHAP Values are inconsistent

Open ntrost-targ opened this issue 4 years ago • 11 comments
trafficstars

Describe the bug When calculating TreeSHAP values for random forest classification they dont add up. I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant and should be the same for different observations. Note that this is the behaviour we observe in lundbergs python module. Also it would be really handy if there were a function that just calculates the base value (expected_value in python) for me.

Expected behavior When calculating TreeSHAP Values I expect them to add up together with the base value to the predicted probability

Actual behavior Calculated Base Values vary even for observations from the same class

Code snippet

val iris = read.arff("../data/weka/iris.arff")

val formula: Formula = "class" ~
val x = formula.x(iris).toArray
val y = formula.y(iris).toIntArray

val model = smile.classification.randomForest(formula,iris)

val arr50 = new Array[Double](3)
val arr52 = new Array[Double](3)

model.vote(iris(50),arr50)
model.vote(iris(52),arr52)

val shap_50 = model.shap(iris(50))
val shap_52 = model.shap(iris(52))

arr50(1)-shap_50.indices.filter(x => (x+2) % 3 == 0).map(shap_50).sum
// res15: Double = 0.41123849878987584
arr52(1)-shap_52.indices.filter(x => (x+2) % 3 == 0).map(shap_52).sum
// res16: Double = 0.4571260444068466

Input data Iris data set

Additional context

  • using smile from the Try-It-Online binder

ntrost-targ avatar Jun 09 '21 13:06 ntrost-targ

What if you use model.predict() instead of mode.vote()?

haifengl avatar Jun 09 '21 16:06 haifengl

When i tried model.predict() gave me only the most probable class as output, not the class probabilities?

ntrost-targ avatar Jun 09 '21 19:06 ntrost-targ

i also tried model.score() but that threw an error

ntrost-targ avatar Jun 09 '21 19:06 ntrost-targ

predict is overloaded. Try predict(x, prob) where prob is an array for output.

haifengl avatar Jun 10 '21 01:06 haifengl

I tried, the problem persists, ableit with smaller variance. I now get 0.3267 vs 0.3289 - which is a more realistic value given the balanced 3 class data set. With lundbergs package that variance is much smaller.

ntrost-targ avatar Jun 10 '21 13:06 ntrost-targ

I don't understand this part:

I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant

Why? Do you have link to somewhere of paper to prove this?

haifengl avatar Jun 10 '21 20:06 haifengl

Hi,

I am a colleague of ntrost-targ. Our assumption comes from the local accuray / additivity of explainability (see https://ema.drwhy.ai/breakDown.html#BDMethodGen and Titanic sample with base value = 0.2353095 and posterior probabilities as f(x) ) which should give the posterior probability for any sample x as the sum of a common base-probability (or base shap value, being the mean class posterior over the full dataset) and of local attribution effects (or local shap values coming from model.shap(iris(50))) for a given sample x (here iris(50)).

That is phi(x) = phi_0 + \sum_i=1^M phi_i(x) = p(x) = class posterior

(see also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7326367/#S10title , Property (1) ).

We just wanted to assert equality with the python lundberg implementation and did the reverse computation as posterior p(x) - \sum_i=1^M phi_i(x) = phi(x) - \sum_i=1^M phi_i(x) = phi_0 which should give us a somewhat constant phi_0 for all samples (modulo numerical issues). The python implementation gives always phi_0 (diverging from the fourth decimal place. The SMILE values start to diverge from the third decimal place). The range for phi_0 in one setting was [0.50026,0.50597]. Might be nitpicking here ;) We were just wondering.

Cheers, Marc

mroettig avatar Jun 10 '21 21:06 mroettig

Hi Marc, thanks for the explanation. Although not 100% sure, I think that this small difference comes from the smoothing of posteriori probability. Depending on the leaf node size, this smoothing may have slightly different impact on posteriori probability calculation.

If you choose two samples hitting the same leaf node, I guess that this difference will be smaller. It is hard to know if two samples arrive the same leaf node. As a work around, I suggest you to compute the difference on all the samples of one class. I guess that you will find several clusters of values with tiny difference.

haifengl avatar Jun 11 '21 01:06 haifengl

Hi Haifeng,

i tried the same calculation but with Gradient Boosted Trees (smile.classification.gbm(formula,iris) ) but arrive at 0.59 vs 0.69 (which is also in absolute an odd value, i'm expecting roughly 0.33). Also the variance for the whole iris dataset in python is negligible on the order of 1e-16 - see the script below. For us the local explainability with shap is very important and we would be thankful if you take a deeper look into the issue. From what i'm thinking any numerical issues should be way smaller in variance than what i see here.

import shap
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import numpy as np

iris = datasets.load_iris()
clf = RandomForestClassifier(max_depth=2, random_state=0)
explainer = shap.TreeExplainer(clf)

probs = clf.predict_proba(iris.data).transpose()[0]
shap_values = explainer.shap_values(iris.data)

b_0 = np.array([probs[i]-shap_values[0][i].sum() for i in range(150)])

b_0.std()
# 1.6922557229846184e-16

b_0.mean()
# 0.32906666666666645

explainer.expected_value
# array([0.32906667, 0.33373333, 0.3372    ])

notice how the backward calculation matches the explainer.expected_value for the first class.

Bests, Nikolaus :)

ntrost-targ avatar Jun 11 '21 15:06 ntrost-targ

Hi Haifeng,

I just came across your Commercial License Usage clause in the SMILE license when using SMILE in a commercial setting (i.e. incorporation of SMILE in commercial products). But I could not find any further details on the website regarding modalities and costs for the commercial license and setting.

Could you give us details on that topic and when commercial licensing is required ? And could we request a deeper look into our SHAP issue on your side when being commercial subscribers ?

Thanks a lot in advance + Cheers, Marc

mroettig avatar Jun 17 '21 12:06 mroettig

@mroettig please contact me by email.

haifengl avatar Jun 17 '21 12:06 haifengl