Ma, Guokai
Ma, Guokai
Is https://github.com/microsoft/DeepSpeed/pull/4071 related to this request?
> > Is #4071 related to this request? > > Yes, but besides changing triton version, kernel needs updates as well. Hi @YizhouZ , what specific kernel error you met?...
https://huggingface.co/IEITYuan/Yuan2-102B-hf
Hi @tjruwase, we get request to support Yuan model AutoTP (https://huggingface.co/IEITYuan/Yuan2-102B-hf). This model has special QKV format and also has convolution layers which need special treatment in tensor parallelism. This...
> > > Hi @delock - FYI could you resolve the merge conflicts on this PR so it can be reviewed/tests run? > > > > > > Hi @loadams....
@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend...
hi @RezaYazdaniAminabadi, this PR solve Falcon-40b autoTP with uneven sharding on i.e. 3 ranks. Can this PR be reviewed? Thanks!
@tjruwase from the failure log it seems like a environment issue. Has this already been resolved? Thanks! ``` FAILED tests/deepspeed/test_deepspeed.py::TrainerIntegrationDeepSpeed::test_early_get_last_lr_zero2_fp16 - deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.6...
Hi @YizhouZ can you show the command line launched by deepspeed before and after your PR, illuatrating how your PR could help reduce command line length? Thanks!
Hi @mrwyattii Do you have any comments on this PR? This PR is essential when need to run DeepSpeed training on thousands of nodes with MPICH. The former implementation would...