SwissArmyTransformer
SwissArmyTransformer copied to clipboard
MixtralMlpMixin()这个函数里面moe只是计算专家的logits但是没看到分发逻辑
https://github.com/THUDM/SwissArmyTransformer/blob/main/sat/model/official/mixtral_model.py