mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Sequence parallelism in the mixer (Context Parallelism)

Open TranSirius opened this issue 7 months ago • 1 comments

The general question is, does mamba-ssm currently support sequence parallelism in the mixer?

I noticed that Section 8.2 in the paper of Mamba2 proposes a potential way to split activation among multiple devices during mixing information among tokens. Does current version of mamba-ssm support such context-parallelism scheme?

By the way, if it is possible to confirm that, the suggested implementation should be incorporated into the fast scan algorithm. As a parallel tree traversing algorithm, each node should be calculated on a single device. In the leaf-to-root pass, the communication will be invoked when two brother nodes are calculated on different devices to transmit the hidden information; in the root-to-leaf pass, the communication is similarly triggered. I show a simple illustration on how to implement CP. As a result, the CP_SIZE is also determined by the number of children when implementing the fast scan algorithm. (Just to confirm whether I am understanding correctly, thx)

image

TranSirius avatar Jul 21 '24 03:07 TranSirius