mamba
mamba copied to clipboard
X,B,C correspond to Q,K,V or to V,K,Q
Hi authors, thanks for the amazing work!
I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say
# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality
which is correct?
Thanks
X <-> V B <-> K C <-> Q
Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.
Where in the paper does it say this? That would be a mistake on our end.
Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.
Where in the paper does it say this? That would be a mistake on our end.
Thanks for reply.
It says here: