gpt-neox Add Mixture of Experts

Add Mixture of Experts

Open sdtblck opened this issue 3 years ago • 0 comments

from DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times .

It should be a fairly simple addition as the codebase they open source is largely similar to ours (same base model, although we have diverged a bit since).

Dec 12 '21 21:12 sdtblck

gpt-neox gpt-neox copied to clipboard

Add Mixture of Experts

gpt-neox
gpt-neox copied to clipboard