Ranggi Hwang

Results 2 comments of Ranggi Hwang

The thing that makes me feel weird is that Hugging Face integrated DeepSpeed support MoE layer for ZeRO-3...

Hi, @awan-10 I just want to run inference on both models, Google/switch-transformer-XXL and Google/switch-transformer-c. The capacity of these models is more than 500GB. Furthermore, to run smaller models, I think...