LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

Question of Equation 11

Open sigma-alpha-beta opened this issue 2 years ago • 2 comments

Thanks for your great work,There are two questions that I don't understand. I want to ask you for advice (1)As Mentioned in the paper [X‘; X]denotes the matrix concatenation, I want to know how are they connected, Is it in the channel dimension or more like the batch dimension? (2)How is this step in Equation 11 derived?thanks 11

sigma-alpha-beta avatar Feb 23 '23 11:02 sigma-alpha-beta

Thanks for your questions~ (1) They are concatenated along the token dimension. For example, X' has a shape of (d, len_demo), X has a shape of (d, len_query), then their concatenation will have a shape of (d, len_demo + len_query). (2) According to the rules of matrix multiplication, this step can be derived. [A1; A2] · [B1; B2]^T = A1 · B1 ^ T + A2 · B2^T. Here, A1 is W_V · X', A2 is W_V · X, B1 is W_K · X', B2 is W_K · X. As the simplest example, [1; 2] · [3; 4]^T = [1 * 3 + 2 * 4] =[11]; and [1] · [3] + [2] · [4] = [3] + [8] = [11] as well.

Hunter-DDM avatar Feb 24 '23 05:02 Hunter-DDM

Thank you for your explanation. I got it.

sigma-alpha-beta avatar Feb 24 '23 07:02 sigma-alpha-beta