vit-pytorch Attention maps

Hi! First, thanks for the great resource. I was wondering how difficult would be to implement the attention results they show in the Fig. 6 and Fig 13 of the paper. I am not quite familiar with transformers. This is similar to GradCam o some different approach?

Oct 05 '20 07:10 Tato14

@Tato14 Hi Joan! Seems like the approach came from https://arxiv.org/pdf/2005.00928.pdf I'll have to read it after I get through my queue of papers this week to see how difficult it is to implement! Feel free to keep this issue open in the meanwhile

Oct 05 '20 15:10 lucidrains

@Tato14 the naive attention map for individual layers is this variable attn https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py#L56

Oct 05 '20 16:10 lucidrains

Its probably this line https://github.com/lucidrains/vit-pytorch/blob/6c8dfc185ea41f4d2388e4d33bbb76f900ff8a0a/vit_pytorch/vit_pytorch.py#L63

Nov 19 '20 08:11 lukasfolle

Why is the softmax only applied to dim=-1? Shouldn't the softmax be calculated over the last 2 dimensions i.e. over the whole matrix instead of just one dimension of the matrix?

edit: I'll open a separate Issue

Dec 17 '20 12:12 PascalHbr

Hi @lucidrains, is there any news for the attention map visualization? Thanks!

Mar 14 '21 05:03 suarezjessie

It seems this has been implemented; see the description in the README here.

Apr 10 '21 00:04 jpgard

vit-pytorch vit-pytorch copied to clipboard

Attention maps

vit-pytorch
vit-pytorch copied to clipboard