pytorch-grad-cam Strange! The last block doesn't get the right answer, the others can.

Strange! The last block doesn't get the right answer, the others can.

Open pakchoi-php opened this issue 3 years ago • 4 comments

Hi, when I was visualizing the ViT, I couldn't get the correct visualization when the last block was the target layer. The output gradient is shown in the figure below. Hope you can answer it. (Correct results can be obtained for other blocks, I used class token as the classification feature) Uploading image.png…

Nov 16 '22 12:11 pakchoi-php

What is the exact layer that you used ?

The output from ViT is composed of tokens + the cls token. The classification is done on the cls token. This means that the other tokens from the last layer, are not connected to the output - they won't work. When you go one layer back, the spatial tokens are connected to the output (through the cls token in the layer above).

Nov 16 '22 13:11 jacobgil

This is my network when I was testing, last blocks is just a transformer block, there are three transformer blocks in the encoder, I added other modules before last blocks when I trained, thank you very much for your reply.

Nov 17 '22 01:11 pakchoi-php

pytorch-grad-cam pytorch-grad-cam copied to clipboard

Strange! The last block doesn't get the right answer, the others can.

pytorch-grad-cam
pytorch-grad-cam copied to clipboard