Ma, Guokai comments

Results 180 comments of


                                            Ma, Guokai

[REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton > 2.1

Is https://github.com/microsoft/DeepSpeed/pull/4071 related to this request?

[REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton > 2.1

> > Is #4071 related to this request? > > Yes, but besides changing triton version, kernel needs updates as well. Hi @YizhouZ , what specific kernel error you met?...

enable yuan autotp & add conv tp

https://huggingface.co/IEITYuan/Yuan2-102B-hf

enable yuan autotp & add conv tp

Hi @tjruwase, we get request to support Yuan model AutoTP (https://huggingface.co/IEITYuan/Yuan2-102B-hf). This model has special QKV format and also has convolution layers which need special treatment in tensor parallelism. This...

enable yuan autotp & add conv tp

> > > Hi @delock - FYI could you resolve the merge conflicts on this PR so it can be reviewed/tests run? > > > > > > Hi @loadams....

Add Compressedbackend for Onebit optimizers

@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend...

fix num_kv_heads sharding in uneven autoTP for Falcon-40b

hi @RezaYazdaniAminabadi, this PR solve Falcon-40b autoTP with uneven sharding on i.e. 3 ranks. Can this PR be reviewed? Thanks!

fix num_kv_heads sharding in uneven autoTP for Falcon-40b

@tjruwase from the failure log it seems like a environment issue. Has this already been resolved? Thanks! ``` FAILED tests/deepspeed/test_deepspeed.py::TrainerIntegrationDeepSpeed::test_early_get_last_lr_zero2_fp16 - deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.6...

deepspeed/launcher: add launcher_helper as each rank's start portal

Hi @YizhouZ can you show the command line launched by deepspeed before and after your PR, illuatrating how your PR could help reduce command line length? Thanks!

deepspeed/launcher: add launcher_helper as each rank's start portal

Hi @mrwyattii Do you have any comments on this PR? This PR is essential when need to run DeepSpeed training on thousands of nodes with MPICH. The former implementation would...