LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

Question of Equation 11

Open guoguangchao opened this issue 2 years ago • 2 comments

Thanks for your great work,There are two questions that I don't understand. I want to ask you for advice (1)As Mentioned in the paper [X‘; X]denotes the matrix concatenation, I want to know how are they connected, Is it in the channel dimension or more like the batch dimension? (2)How is this step in Equation 11 derived?thanks 11

guoguangchao avatar Feb 23 '23 11:02 guoguangchao

Thanks for your questions~ (1) They are concatenated along the token dimension. For example, X' has a shape of (d, len_demo), X has a shape of (d, len_query), then their concatenation will have a shape of (d, len_demo + len_query). (2) According to the rules of matrix multiplication, this step can be derived. [A1; A2] · [B1; B2]^T = A1 · B1 ^ T + A2 · B2^T. Here, A1 is W_V · X', A2 is W_V · X, B1 is W_K · X', B2 is W_K · X. As the simplest example, [1; 2] · [3; 4]^T = [1 * 3 + 2 * 4] =[11]; and [1] · [3] + [2] · [4] = [3] + [8] = [11] as well.

Hunter-DDM avatar Feb 24 '23 05:02 Hunter-DDM

Thank you for your explanation. I got it.

guoguangchao avatar Feb 24 '23 07:02 guoguangchao