ViT-pytorch Why we need to calculate residual connections when visualize attention maps?

Why we need to calculate residual connections when visualize attention maps?

Open JamenceTom opened this issue 3 years ago • 3 comments

Thanks for your great job! I am curious why we need to calculate residual connections when visualizing attention maps?

Dec 16 '21 07:12 JamenceTom

I'm curious too! Why do we need this?

May 12 '22 15:05 vgoklani

Same question, hi @jeonsworld, could you please help to elaborate any specific reason for adding this identity matrix? Much appreciate.

Feb 13 '23 15:02 JimEverest

In my opinion, In ViT's transformer module, It has residual connection. So from layer 1 to 12, Actual Attention map which ViT Model use is residual attention map.

Nov 17 '23 04:11 SeungHyun104

ViT-pytorch ViT-pytorch copied to clipboard

Why we need to calculate residual connections when visualize attention maps?

ViT-pytorch
ViT-pytorch copied to clipboard