megablocks
megablocks copied to clipboard
Implement Mixture of Depth and Experts (MoDE)
Given that MegaBlocks is highly optimized for sparse MoE models like Mixtral, I am requesting support for a variant recently termed as MoDE by Google DeepMind. Benefits include much faster training and inference due to increased sparsity.
Paper: https://arxiv.org/abs/2404.02258
I found two implementations:
- https://github.com/epfml/llm-baselines/blob/mixture_of_depth/src/models/mod.py
- https://github.com/kyegomez/Mixture-of-Depths/blob/main/mixture_of_depths/main.py