BlackBearBiscuit

Results 2 issues of BlackBearBiscuit

### Describe the issue Issue: 想请教一下是否在13B以上的MoE模型上实验过? 我使用了ZeRO-2,EP_SIZE=8; 在初始化optimizer状态时会报cuda: out of memory. 而ZeRO-3则不支持MoE, 由于设备限制,我也无法采用offload加载; 是不是还是得考虑megatron-deepspeed? Environment: ``` GPU: 8×A100-80G Deepspeed version:0.10.0 Torch version: Transformers version: Tokenizers version: ``` Command: ``` PASTE...