zero-bubble-pipeline-parallelism icon indicating copy to clipboard operation
zero-bubble-pipeline-parallelism copied to clipboard

Support sequence parallel on main branch

Open ufotalent opened this issue 1 year ago • 1 comments

ufotalent avatar Dec 26 '23 06:12 ufotalent

Lazy computation of partial gradients of weights with an aid of queue is really smart!. @ufotalent

However, I don't believe that you need to support sequence parallel, a.k.a it does not provide any useful features in reducing the total tokens processed in a single machine, only little improvements on batchnorm and dropout.

Context parallel is much more preferred.