Ranggi Hwang comments

Repositories
Issues
Comments

Results 2 comments of


                                            Ranggi Hwang

[REQUEST] ZeRO stage 3 support for mixture-of-experts (MoE) layer

The thing that makes me feel weird is that Hugging Face integrated DeepSpeed support MoE layer for ZeRO-3...

[REQUEST] ZeRO stage 3 support for mixture-of-experts (MoE) layer

Hi, @awan-10 I just want to run inference on both models, Google/switch-transformer-XXL and Google/switch-transformer-c. The capacity of these models is more than 500GB. Furthermore, to run smaller models, I think...