Recover attention scores
Is it possible to recover the attention scores from the Fast Attention module?
I don't believe that's possible because the order of computation is (Q' (K'^T V)). Would be interesting to know someone has a different idea/workaround.
In performer paper, the author use a special "V", which is a diagonal matrix (one-hot indicators), then the attention outputs just equal attention scores. I suggest you read the paragraphs around Figure 10 in the paper. However, I have trouble in the implementation of it, because it is confusing to pass both attention scores and results to other functions/classes meantime.
In performer paper, the author use a special "V", which is a diagonal matrix (one-hot indicators), then the attention outputs just equal attention scores. I suggest you read the paragraphs around Figure 10 in the paper. However, I have trouble in the implementation of it, because it is confusing to pass both attention scores and results to other functions/classes meantime.
@lucidrains Could you please help us about the implementation of obtain attention weights?