vit-pytorch
vit-pytorch copied to clipboard
Using the Attention Outputs
Hi! Was just wondering how to properly use the output attention. Based from the README.md
, it returns a tuple of (batch x layers x heads x patch x patch)
. In this case, so that we can properly overlay the attention on our original images, we need to do the following:
- Choose a layer for which the attention should be computed
- Rearrange the patches back to the original image shape
- Average the rearranged patches across all heads
Is this the correct way?
Hi @suarezjessie ! Did you manage to plot the attention maps ?