DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

fix uneven issue & add balance autotp

Open Yejing-Lai opened this issue 2 years ago • 6 comments
trafficstars

This PR aims to balance the shard size of each worker as even as possible.

  1. We refactor the tp_shard logic that can make AutoTP work when split_shape % num_kv_heads != 0.
  2. When num_kv_heads is defined, the attention module relies on it to sharding, but the mlp and lm_head modules can use near even division to get more balance shard. It will get better performance.

Yejing-Lai avatar Nov 17 '23 01:11 Yejing-Lai

Hi @RezaYazdaniAminabadi @delock. Could you please help review this PR? Thanks~

Yejing-Lai avatar Nov 17 '23 01:11 Yejing-Lai

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

delock avatar Nov 17 '23 01:11 delock

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

DNN library favors tensor size in granularity of power of 2, we pick 64 as a common granularity size.

Yejing-Lai avatar Nov 17 '23 03:11 Yejing-Lai

Hi @RezaYazdaniAminabadi , FYI This PR improves AutoTP sharding when number of heads cannot be divided by number of ranks. MLP layers will have better load balance when running AutoTP on 3 devices or 3 CPU sub-NUMA clusters.

delock avatar Nov 21 '23 01:11 delock

@Yejing-Lai, please help resolve conflict.

tjruwase avatar Dec 13 '23 20:12 tjruwase

@Yejing-Lai, please help resolve conflict.

Hi @tjruwase. I resolved the conflict. Can you approve the workflows? Thanks~

Yejing-Lai avatar Dec 20 '23 14:12 Yejing-Lai

Hi @tjruwase. The conflict had been resolved. Could you please help review this PR? Thanks~

Yejing-Lai avatar Jan 10 '24 01:01 Yejing-Lai

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

delock avatar Jan 12 '24 01:01 delock

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

@delock, sorry for the delay. This should be reviewed soon and will be included in next release.

tjruwase avatar Jan 12 '24 10:01 tjruwase