MoEBERT icon indicating copy to clipboard operation
MoEBERT copied to clipboard

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Results 5 MoEBERT issues
Sort by recently updated
recently updated
newest added

I just remove "--do_train" and "--do_eval" lines in bert_base_mnli_example.sh, an add a line that"--do_predict". But when I run it, "Need to turn the model to a MoE first" error happens....

Hi, from the paper I thought that the most important parameters are shared across different experts. However, in the code I did n't see how to ensure the parameters are...

Hi @SimiaoZuo , as you mentioned that we need to finetune first. But how to get the finetune model and translate into `bert_base_mnli_example.sh`! Many thanks!

Hi @SimiaoZuo , I encoutered problems when run `bash bert_base_mnli_example.sh` The error information is below! Thanks very much! ``` /home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead "The...

How about the performance difference between token-gate and sentence gate? And how about the value of alpha for load balance loss?