ViT-pytorch icon indicating copy to clipboard operation
ViT-pytorch copied to clipboard

Why we need to calculate residual connections when visualize attention maps?

Open JamenceTom opened this issue 3 years ago • 3 comments

Thanks for your great job! I am curious why we need to calculate residual connections when visualizing attention maps? image

JamenceTom avatar Dec 16 '21 07:12 JamenceTom

I'm curious too! Why do we need this?

vgoklani avatar May 12 '22 15:05 vgoklani

Same question, hi @jeonsworld, could you please help to elaborate any specific reason for adding this identity matrix? Much appreciate.

JimEverest avatar Feb 13 '23 15:02 JimEverest

In my opinion, In ViT's transformer module, It has residual connection. So from layer 1 to 12, Actual Attention map which ViT Model use is residual attention map.

SeungHyun104 avatar Nov 17 '23 04:11 SeungHyun104