flash-linear-attention icon indicating copy to clipboard operation
flash-linear-attention copied to clipboard

[RFC] Support context parallelism

Open sustcsonglin opened this issue 8 months ago • 0 comments

Proposal

support context parallelism for all linear attention models

Rationale

One of the major advantages of linear attention is that it enables long sequence modeling. However, for training and prefilling, a single GPU will often lack sufficient memory to process the entire input, making context parallelism essential.

sustcsonglin avatar Mar 07 '25 09:03 sustcsonglin