MoEBERT issues

"Need to turn the model to a MoE first" error

5

I just remove "--do_train" and "--do_eval" lines in bert_base_mnli_example.sh, an add a line that"--do_predict". But when I run it, "Need to turn the model to a MoE first" error happens....

Harry-zzh

Parameters are not shared in experts

Hi, from the paper I thought that the most important parameters are shared across different experts. However, in the code I did n't see how to ensure the parameters are...

tairan-w

What is the bash script of finetune without MoE

Hi @SimiaoZuo , as you mentioned that we need to finetune first. But how to get the finetune model and translate into `bert_base_mnli_example.sh`! Many thanks!

CaffreyR

Error when run `bash bert_base_mnli_example.sh`

Hi @SimiaoZuo , I encoutered problems when run `bash bert_base_mnli_example.sh` The error information is below! Thanks very much! ``` /home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead "The...

CaffreyR

How about the performance difference between token-gate and sentence gate?

How about the performance difference between token-gate and sentence gate? And how about the value of alpha for load balance loss?

GeneZC

MoEBERT
MoEBERT copied to clipboard

Metadata

"Need to turn the model to a MoE first" error

Parameters are not shared in experts

What is the bash script of finetune without MoE

Error when run `bash bert_base_mnli_example.sh`

How about the performance difference between token-gate and sentence gate?

← Metadata

Owner

Metadata

MoEBERT MoEBERT copied to clipboard

Metadata

"Need to turn the model to a MoE first" error

Parameters are not shared in experts

What is the bash script of finetune without MoE

Error when run `bash bert_base_mnli_example.sh`

How about the performance difference between token-gate and sentence gate?

← Metadata

Owner

Metadata

MoEBERT
MoEBERT copied to clipboard