DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

sequence parallel for uneven heads

Open inkcherry opened this issue 1 year ago • 2 comments

In sequence_parallel (Ulysses), the sequence parallel size is constrained by the requirement to be divisible by the number of heads, which prevents some models/workloads from setting a specific sequence parallel size. This PR implements uneven all-to-all heads splitting.

  • both support batch first (b,s,...) and seq_len first(s,b..) layout.
  • Added unit tests with numerical checks. Locally also tested with 7 heads with sp=4 and 20 heads with sp=8, and it passed.

inkcherry avatar Aug 21 '24 13:08 inkcherry

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested from customer. Can this PR be reviewed? Thanks!

delock avatar Aug 30 '24 01:08 delock

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested from customer. Can this PR be reviewed? Thanks!

Thanks @delock, things look good now, but we will just need to get a review.

loadams avatar Sep 03 '24 23:09 loadams