SeungHyun104 comments

Repositories
Issues
Comments

Results 1 comments of


                                            SeungHyun104

Why we need to calculate residual connections when visualize attention maps?

In my opinion, In ViT's transformer module, It has residual connection. So from layer 1 to 12, Actual Attention map which ViT Model use is residual attention map.