TUPE icon indicating copy to clipboard operation
TUPE copied to clipboard

How to calculate correlation in Figure 2?

Open Redaimao opened this issue 3 years ago • 1 comments

Hi, Thanks for your wonderful work. I am unsure about how you've derived the correlation matrix as per figure 2 in terms of the variables used in the calculation as well as the derivation for the correlation matrix.

For instance, does the word-to-word correlation matrix uses the correlation of w_i W^{Q,1} and w_j W^{K,1}T as the variable for the calculation? Also, how do you reduce the dimension for the correlation matrix as the standard correlation calculation only deals with scalar variables?

Thanks!

Redaimao avatar Aug 20 '21 15:08 Redaimao

Hi, @Redaimao

  1. we use the first self-attention layer to calculate, as the later layers have residuals.
  2. then, as there is Dropout(LayerNorm(x)) for word_emb+pos_emb before transformer. Since LayerNorm(a + b) != LayerNorm(a) + LayerNorm(b), you need to calculate word_emb and pos_emb correctly.
  3. then, in the first self-attention layer, you can calculate four correlation items for word and pos.
  4. for the final results, we randomly pick a batch (size=32), and average the correlation matrix along batch dimension. then, there are multiple heads, we pick one head for demonstration.
  5. I think you misunderstand the term correlation in our paper, it actually is the attention scores (logits before softmax).

guolinke avatar Aug 24 '21 03:08 guolinke