FlagAI [Question]: 训练速度为何能达到 Megatron+DeepSpeed ZeRO-2 的八倍

[Question]: 训练速度为何能达到 Megatron+DeepSpeed ZeRO-2 的八倍

Open wj-Mcat opened this issue 1 year ago • 2 comments

在公众号中号称训练速度是Megatron（公众号名称写错了，注意去修改一下）+DeepSpeed ZeRO-2的8倍，这块的训练方法能给出来吗？是example中给出来的简单训练脚本吗？两边实验的实验脚本能给出来吗？

坐等分享

No response

Jun 09 '23 16:06 wj-Mcat

非常期待官方回复。

Jun 10 '23 09:06 wj-Mcat

我们在FlagAI的框架中集成了Megatron和Deepspeed的支持。在训练33B模型的时候，如果是2台A100 40G的机器情况下对比，我们得到了上述的结论，详细的信息整理会后发出来。

Jun 12 '23 01:06 marscrazy

先关闭，如有问题重新打开issue，谢谢

Jun 22 '23 11:06 ftgreat