SeungHyun104

Results 1 comments of SeungHyun104

In my opinion, In ViT's transformer module, It has residual connection. So from layer 1 to 12, Actual Attention map which ViT Model use is residual attention map.