Trevor Gale

Results 1 issues of Trevor Gale

These changes add support for using MegaBlocks dMoE and MoE layers in Megatron. MegaBlocks is exposed through an [adapter](https://github.com/NVIDIA/Megatron-LM/compare/main...stanford-futuredata:Megatron-LM:basic-megablocks-integration#diff-aa9d60b130b2ce6bd6810f247a0e1770fe0d0279d01cf8b491cd03df2c72be7a) which isolates the `megablocks` package dependency so that it does not...