Why is sigmoid activation for LRP not allowed?
❓ Questions and Help
I tried to a small model with a sigmoid activation but it's actually tested here that this does not work. Is there a specific reason for that? IMO since sigmoid is a scalar operation it should work analogously to ReLU and Tanh which can be used with LRP.
Simply adding sigmoid here yields the expected result. So why not just do so?
I would be willing to create the PR and add a test for this if there is no reason not to.
LRP was designed for ReLU networks and generalized to leaky-ReLU. my idea is, because the Sigmoid function does not satisfy f(0)=0 and sign(f(-x)) = -1 which leads to un-intuitive results, like following example:
f(x) = sigmoid( x1 w1+ x2 w2 ) = sigmoid(z1 + z2) x1 = 2 and x2 = 1 w1 = -1 and w2 = 1 -> z1 = -2 and z2 = 1 f(x) = sigmoid( -2 +1 ) = sigmoid(-1) = 0.2689
x1 (or z1) pushes to a lower activation and x2 (or z2) pushes to a higher activation. x1 should be assigned a small relevance and x2 should be assigned a greater relevance, but: R1 = 0.5379 an R2 = -0.2689