MeZO
MeZO copied to clipboard
Cannot reproduce the results for RoBERTa on SST-2
Hello,
Thank you for your fantastic work. When I ran the codes for RoBERTa on SST-2, I found the results could not be reproduced.
For instance, for FT (full parameter fine-tune) with Adam, running
TASK=SST-2 K=16 SEED=42 BS=8 LR=5e-5 MODEL=roberta-large bash finetune.sh
gets val_acc = 0.90625, and test_acc = 0.84518
When I try smaller learning rates, the results are worse.
But Table 16 in the paper claims the acc should be 91.9. Is it acc on test_set or val_set? If it is on test_acc, it seems difficult to reproduce.
The results for MeZO are worse. About eval_acc < 0.8 when I run
TASK=SST-2 K=16 SEED=42 BS=64 LR=1e-5 EPS=1e-3 MODEL=roberta-large bash mezo.sh
Can the authors provide more details on the best hyper-parameters? I would be very grateful for that.
Best regards,
Hi,
The example script we provided is not necessarily the best hyperparameter setting (just a hyperparameter setting that can usually get good results so it would be a good starting point for you to try). To reproduce the results, please follow our appendix D.3 for the complete grid search.