mamba icon indicating copy to clipboard operation
mamba copied to clipboard

X,B,C correspond to Q,K,V or to V,K,Q

Open MichaelWangGo opened this issue 8 months ago • 3 comments
trafficstars

Hi authors, thanks for the amazing work!

I am a little confused about that in your paper [Transformers are SSMs], you said that 'Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.' But I found in your code [mamba2_simply.py], the comments say

# Split into 3 main branches: X, B, C
# These correspond to V, K, Q respectively in the SSM/attention duality

which is correct?

Thanks

MichaelWangGo avatar Mar 11 '25 21:03 MichaelWangGo

X <-> V B <-> K C <-> Q

tridao avatar Mar 11 '25 21:03 tridao

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

albertfgu avatar Mar 24 '25 17:03 albertfgu

Note the analogy to standard attention architectures, where X,B,C correspond to the Q,K,V projections that are created in parallel.

Where in the paper does it say this? That would be a mistake on our end.

Thanks for reply.

It says here:

Image

MichaelWangGo avatar Mar 24 '25 17:03 MichaelWangGo