captum icon indicating copy to clipboard operation
captum copied to clipboard

Why is sigmoid activation for LRP not allowed?

Open CloseChoice opened this issue 1 year ago • 1 comments

❓ Questions and Help

I tried to a small model with a sigmoid activation but it's actually tested here that this does not work. Is there a specific reason for that? IMO since sigmoid is a scalar operation it should work analogously to ReLU and Tanh which can be used with LRP.

Simply adding sigmoid here yields the expected result. So why not just do so?

I would be willing to create the PR and add a test for this if there is no reason not to.

CloseChoice avatar Oct 01 '24 15:10 CloseChoice

LRP was designed for ReLU networks and generalized to leaky-ReLU. my idea is, because the Sigmoid function does not satisfy f(0)=0 and sign(f(-x)) = -1 which leads to un-intuitive results, like following example:

f(x) = sigmoid( x1 w1+ x2 w2 ) = sigmoid(z1 + z2) x1 = 2 and x2 = 1 w1 = -1 and w2 = 1 -> z1 = -2 and z2 = 1 f(x) = sigmoid( -2 +1 ) = sigmoid(-1) = 0.2689

x1 (or z1) pushes to a lower activation and x2 (or z2) pushes to a higher activation. x1 should be assigned a small relevance and x2 should be assigned a greater relevance, but: R1 = 0.5379 an R2 = -0.2689

nicogross avatar Feb 04 '25 19:02 nicogross