flash-linear-attention
flash-linear-attention copied to clipboard
[RFC] Support context parallelism
Proposal
support context parallelism for all linear attention models
Rationale
One of the major advantages of linear attention is that it enables long sequence modeling. However, for training and prefilling, a single GPU will often lack sufficient memory to process the entire input, making context parallelism essential.