Integrated-Gradients How would you justify negative attributions?

How would you justify negative attributions?

Open Yijun88 opened this issue 5 years ago • 3 comments

Hi Ankur,

Thank you for the excellent job of integrated gradient! It provides a great guideline for exploring what the neural network is doing. Can I ask whether there is any justification for negative attributions? Or should we just interpret that as a smaller attribution. Because it's not that intuitive seeing negative attribution in LSTMs.

E.g. Given Attribution_1 = -1, Attribution_2 = 1, can we naively suggest that Attribution_2 brings more impact to the final result?

Best, Yijun

Dec 04 '19 07:12 Yijun88

Negative attribution usually means that removing that pixel would increase the probability of that class, while positive attribution means that removing that pixel decreases the probability of that class. Does that help?

I am wondering why there are often large positive and large negative pixels next to each other, maybe someone has a thought on that?

Jan 15 '21 08:01 expectopatronum

Hi guys, do you know what is "that class" by default? For example, in a binary classification problem with class names 0 and 1, how should I interprete a negative attribution?

Mar 05 '21 09:03 ajbanegas

By "that class" I mean the class that is currently being explained, for which I usually use the predicted class of the example.

Mar 09 '21 05:03 expectopatronum

Integrated-Gradients Integrated-Gradients copied to clipboard

How would you justify negative attributions?

Integrated-Gradients
Integrated-Gradients copied to clipboard