Ma, Guokai
Ma, Guokai
Hi @loadams This PR has some new changes that is working on merge into master, I have updated PR description. Can you help reopen this PR with draft mode? Thanks!...
> @abhilash1910, thanks for this PR. I think this PR needs some work that leverages PR #3633 for the following reasons. > > 1. As you observed, strings like `torch.cpu.DoubleTensor`...
> Yes, dtype is better. Some additional changed in _reduce_non_expert_gradients and _reduce_expert_gradients will be needed accordingly.
Hi @abhilash1910 can you check and fix the following error? https://github.com/microsoft/DeepSpeed/actions/runs/6952457019/job/18941598524?pr=3842#step:8:4568 ``` File "/tmp/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 126, in split_half_float_double_sparse assert t.dtype in supported_types, f"attempting to reduce an unsupported grad type: {t.dtype}"...
Hi @abhilash1910 can you clarify whether current failures in CI is related to your PR or just a test issue? Thanks!
Hi @abhilash1910 some suggestions: 1. provide more details (hw, sw, log ...) of your local run so there might be hint of difference. 2. try to modify the test as...
@tjruwase @jeffra could assign a reviewer for this PR? This PR fix OPT checkpoint sharded loading with AutoTP and improve OPT+AutoTP usability, it is needed when run OPT models on...
@RezaYazdaniAminabadi can you review this PR? This PR fix OPT sharded loading for AutoTP. Previously only OPT-125m has sharded checkpoint loading, with this fix OPT >350m will have sharded checkpoint...
@RezaYazdaniAminabadi Hi, a quick check whether this PR is still under consideration. We have verified this PR for CPU accelerator and like to know whether it could be merged into...
Does it make sense to also update [docs/_tutorials/automatic-tensor-parallism.md](https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/automatic-tensor-parallelism.md) to include this model in supported list?