how_attentive_are_gats
how_attentive_are_gats copied to clipboard
Reproduce the GAT v1 attention matrix
Thanks for your great contribution!! I'm confused about Figure 1 (a) in your paper. Which layer of GAT is this attention matrix in? Is the attention matrix of all layers the same? Is the attention matrix between different heads in one layer like this?
Best regards
Hi @ALEX13679173326 ! Thank you for your interest in our work!
This is one of the heads of a single layer of GAT/GATv2, trained on the DictionaryLookup problem (Figure 2). Regarding different layers - this problem can be solved using a single layer so we trained only a single layer, but the same pattern will appear for all multiple layers (possibly with a different argmax key), because GAT simply cannot express any other pattern. Regarding different heads - the figure visualizes just one head, but all other heads exhibit the same pattern - again because GAT cannot express any other pattern.
Does that answer your questions? Feel free to let us know if anything is unclear.
Thanks very much for your reply!!
Recently, I find there is the same pattern in the attention matrix in ViT(Vision Transformer), which also uses self-attention mechanism. If we regard ViT as a graph model, I think this phenomenon may have connection with GAT. So, can I use the code in this repository to generate the result in Figure 1(a)? If not, can you release related codes?
In my immature opinion, this phenomenon in Figure 1(a) may be related to some potential weakness of the self-attention mechanism. Have you researched the cause of this phenomenon?
Thanks again!
Our main analysis is on the GAT formulation. In the appendix of our paper, you can find an additional analysis on dot product attention (e.g. Transformers).