DeepSpeed (Do not merge) (CPU) aggregation of few recent fixes/optimizations

This PR is aggregation of a few recent fixes inorder to support customer. This PR contains the following PRs with some other minor fixes:

[ ] Fix for moe https://github.com/microsoft/DeepSpeed/pull/5519
[ ] Add compressed ops for 1bit adam https://github.com/microsoft/DeepSpeed/pull/5473
[ ] Enable Yuan AutoTP https://github.com/microsoft/DeepSpeed/pull/5428
[x] Add more tests for XPU https://github.com/microsoft/DeepSpeed/pull/5427
[x] Support SHM inference_all_reduce in TorchBackend https://github.com/microsoft/DeepSpeed/pull/5391
[ ] Skip tests if certain OpBuilder not implemented https://github.com/microsoft/DeepSpeed/pull/5372
[x] Enable XPU CI https://github.com/microsoft/DeepSpeed/pull/5376
[x] Enable AutoTP for Mixtral-8bx7 https://github.com/microsoft/DeepSpeed/pull/5257
[x] Fix fused_qkv model accuracy issue https://github.com/microsoft/DeepSpeed/pull/5217
[x] Enable load from meta for T5 and Mistral https://github.com/microsoft/DeepSpeed/pull/4958
[x] Enable Falcon model load from_config https://github.com/microsoft/DeepSpeed/pull/4783
[x] Enable AutoTP for Baichuan model https://github.com/microsoft/DeepSpeed/pull/4721
[x] Enable AutoTP for Falcon-40b with odd number of heads https://github.com/microsoft/DeepSpeed/pull/4712
[x] More balanced sharding for MLP layers with odd number of heads https://github.com/microsoft/DeepSpeed/pull/4697
[x] Fix replace lmhead issue when ckpt not load https://github.com/microsoft/DeepSpeed/pull/4522
[x] Fix CPU inference workflow https://github.com/microsoft/DeepSpeed/pull/4430
[x] Support StarCoder https://github.com/microsoft/DeepSpeed/pull/4896
[x] Fix Falcon-40B accuracy issue https://github.com/microsoft/DeepSpeed/pull/4895

Besides, we have these PRs under track (not in this PR branch but we hope they be merged):

[x] Needed by CPU training https://github.com/microsoft/DeepSpeed/pull/3842
[x] short kernel sequence to graph support https://github.com/microsoft/DeepSpeed/pull/4318
[x] Larger scale support with MPICH launcher https://github.com/microsoft/DeepSpeed/pull/4699
[x] XPU upstream https://github.com/microsoft/DeepSpeed/pull/4547
[ ] WOQ support for autotp https://github.com/microsoft/DeepSpeed/pull/4750
[x] Support model with .safetensor model file only https://github.com/microsoft/DeepSpeed/pull/4854
[x] (new) Update list of supported AutoTP models https://github.com/microsoft/DeepSpeed/pull/4960

Jul 10 '23 10:07 delock

This looks to have been merged already so we can close this PR?

Nov 10 '23 21:11 loadams

Hi @loadams This PR has some new changes that is working on merge into master, I have updated PR description. Can you help reopen this PR with draft mode? Thanks!

We get AutoTP support request for new model from time to time or get bug reports, so sometimes we need to submit new PRs to DeepSpeed for supporting, and add these changes to this PR for early customer access before these changes goes to master. Hope this helps.

Nov 27 '23 05:11 delock

Apologies, yes happy to re-open.

Nov 27 '23 14:11 loadams