mixture-of-experts topic
Generalizable-Mixture-of-Experts
GMoE could be the next backbone model for many kinds of generalization task.
pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
Neural-Implicit-Dict
[ICML 2022] "Neural Implicit Dictionary via Mixture-of-Expert Training" by Peihao Wang, Zhiwen Fan, Tianlong Chen, Zhangyang Wang
MoSE-AUSeg
The official code repo for the paper "Mixture of Stochastic Experts for Modeling Aleatoric Uncertainty in Segmentation". (ICLR 2023)
soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
mixtools
Tools for Analyzing Finite Mixture Models