Ranggi Hwang
Results
2
comments of
Ranggi Hwang
The thing that makes me feel weird is that Hugging Face integrated DeepSpeed support MoE layer for ZeRO-3...
Hi, @awan-10 I just want to run inference on both models, Google/switch-transformer-XXL and Google/switch-transformer-c. The capacity of these models is more than 500GB. Furthermore, to run smaller models, I think...