sys_reading
sys_reading copied to clipboard
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
https://arxiv.org/pdf/2310.03294.pdf
https://x.com/rulinshao/status/1711836608742437159?s=46