gpt-neox
gpt-neox copied to clipboard
Add Mixture of Experts
from DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times .
It should be a fairly simple addition as the codebase they open source is largely similar to ours (same base model, although we have diverged a bit since).