mamba X,B,C correspond to Q,K,V or to V,K,Q

X,B,C correspond to Q,K,V or to V,K,Q

Open MichaelWangGo opened this issue 8 months ago • 3 comments

trafficstars

Hi authors, thanks for the amazing work!

I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say

# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality

which is correct?

Thanks

Mar 11 '25 21:03 MichaelWangGo

X <-> V B <-> K C <-> Q

Mar 11 '25 21:03 tridao

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

Mar 24 '25 17:03 albertfgu

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

Thanks for reply.

It says here:

Mar 24 '25 17:03 MichaelWangGo

mamba mamba copied to clipboard

X,B,C correspond to Q,K,V or to V,K,Q

mamba
mamba copied to clipboard