DeepSpeed fix uneven issue & add balance autotp

trafficstars

This PR aims to balance the shard size of each worker as even as possible.

We refactor the tp_shard logic that can make AutoTP work when split_shape % num_kv_heads != 0.
When num_kv_heads is defined, the attention module relies on it to sharding, but the mlp and lm_head modules can use near even division to get more balance shard. It will get better performance.

Nov 17 '23 01:11 Yejing-Lai

Hi @RezaYazdaniAminabadi @delock. Could you please help review this PR? Thanks~

Nov 17 '23 01:11 Yejing-Lai

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

Nov 17 '23 01:11 delock

Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31

DNN library favors tensor size in granularity of power of 2, we pick 64 as a common granularity size.

Nov 17 '23 03:11 Yejing-Lai

Hi @RezaYazdaniAminabadi , FYI This PR improves AutoTP sharding when number of heads cannot be divided by number of ranks. MLP layers will have better load balance when running AutoTP on 3 devices or 3 CPU sub-NUMA clusters.

Nov 21 '23 01:11 delock

@Yejing-Lai, please help resolve conflict.

Dec 13 '23 20:12 tjruwase

@Yejing-Lai, please help resolve conflict.

Hi @tjruwase. I resolved the conflict. Can you approve the workflows? Thanks~

Dec 20 '23 14:12 Yejing-Lai

Hi @tjruwase. The conflict had been resolved. Could you please help review this PR? Thanks~

Jan 10 '24 01:01 Yejing-Lai

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

Jan 12 '24 01:01 delock

Hi @tjruwase is this PR under review state or merge state? We are working on Intel Extension for PyTorch release and want to know whether this PR will be included in DeepSpeed next release. Thanks!

@delock, sorry for the delay. This should be reviewed soon and will be included in next release.

Jan 12 '24 10:01 tjruwase

DeepSpeed DeepSpeed copied to clipboard

fix uneven issue & add balance autotp

DeepSpeed
DeepSpeed copied to clipboard