pytorch-grad-cam icon indicating copy to clipboard operation
pytorch-grad-cam copied to clipboard

Strange! The last block doesn't get the right answer, the others can.

Open pakchoi-php opened this issue 3 years ago • 4 comments

Hi, when I was visualizing the ViT, I couldn't get the correct visualization when the last block was the target layer. The output gradient is shown in the figure below. Hope you can answer it. (Correct results can be obtained for other blocks, I used class token as the classification feature) Uploading image.png…

pakchoi-php avatar Nov 16 '22 12:11 pakchoi-php

image

pakchoi-php avatar Nov 16 '22 12:11 pakchoi-php

image

pakchoi-php avatar Nov 16 '22 12:11 pakchoi-php

What is the exact layer that you used ?

The output from ViT is composed of tokens + the cls token. The classification is done on the cls token. This means that the other tokens from the last layer, are not connected to the output - they won't work. When you go one layer back, the spatial tokens are connected to the output (through the cls token in the layer above).

jacobgil avatar Nov 16 '22 13:11 jacobgil

image

This is my network when I was testing, last blocks is just a transformer block, there are three transformer blocks in the encoder, I added other modules before last blocks when I trained, thank you very much for your reply.

pakchoi-php avatar Nov 17 '22 01:11 pakchoi-php