TUPE How to calculate correlation in Figure 2?

How to calculate correlation in Figure 2?

Open Redaimao opened this issue 3 years ago • 1 comments

Hi, Thanks for your wonderful work. I am unsure about how you've derived the correlation matrix as per figure 2 in terms of the variables used in the calculation as well as the derivation for the correlation matrix.

For instance, does the word-to-word correlation matrix uses the correlation of w_i W^{Q,1} and w_j W^{K,1}T as the variable for the calculation? Also, how do you reduce the dimension for the correlation matrix as the standard correlation calculation only deals with scalar variables?

Thanks!

Aug 20 '21 15:08 Redaimao

Hi, @Redaimao

we use the first self-attention layer to calculate, as the later layers have residuals.
then, as there is Dropout(LayerNorm(x)) for word_emb+pos_emb before transformer. Since LayerNorm(a + b) != LayerNorm(a) + LayerNorm(b), you need to calculate word_emb and pos_emb correctly.
then, in the first self-attention layer, you can calculate four correlation items for word and pos.
for the final results, we randomly pick a batch (size=32), and average the correlation matrix along batch dimension. then, there are multiple heads, we pick one head for demonstration.
I think you misunderstand the term correlation in our paper, it actually is the attention scores (logits before softmax).

Aug 24 '21 03:08 guolinke

TUPE TUPE copied to clipboard

How to calculate correlation in Figure 2?

TUPE
TUPE copied to clipboard