LMOps
LMOps copied to clipboard
Question of Equation 11
Thanks for your great work,There are two questions that I don't understand. I want to ask you for advice
(1)As Mentioned in the paper [X‘; X]denotes the matrix concatenation, I want to know how are they connected, Is it in the channel dimension or more like the batch dimension?
(2)How is this step in Equation 11 derived?thanks
Thanks for your questions~ (1) They are concatenated along the token dimension. For example, X' has a shape of (d, len_demo), X has a shape of (d, len_query), then their concatenation will have a shape of (d, len_demo + len_query). (2) According to the rules of matrix multiplication, this step can be derived. [A1; A2] · [B1; B2]^T = A1 · B1 ^ T + A2 · B2^T. Here, A1 is W_V · X', A2 is W_V · X, B1 is W_K · X', B2 is W_K · X. As the simplest example, [1; 2] · [3; 4]^T = [1 * 3 + 2 * 4] =[11]; and [1] · [3] + [2] · [4] = [3] + [8] = [11] as well.
Thank you for your explanation. I got it.