MeZO Cannot reproduce the results for RoBERTa on SST-2

Cannot reproduce the results for RoBERTa on SST-2

Open TrueNobility303 opened this issue 1 year ago • 1 comments

Hello,

Thank you for your fantastic work. When I ran the codes for RoBERTa on SST-2, I found the results could not be reproduced.

For instance, for FT (full parameter fine-tune) with Adam, running

TASK=SST-2 K=16 SEED=42 BS=8 LR=5e-5 MODEL=roberta-large bash finetune.sh

gets val_acc = 0.90625, and test_acc = 0.84518

When I try smaller learning rates, the results are worse.

But Table 16 in the paper claims the acc should be 91.9. Is it acc on test_set or val_set? If it is on test_acc, it seems difficult to reproduce.

The results for MeZO are worse. About eval_acc < 0.8 when I run

TASK=SST-2 K=16 SEED=42 BS=64 LR=1e-5 EPS=1e-3 MODEL=roberta-large bash mezo.sh

Can the authors provide more details on the best hyper-parameters? I would be very grateful for that.

Best regards,

Sep 10 '23 00:09 TrueNobility303

Hi,

The example script we provided is not necessarily the best hyperparameter setting (just a hyperparameter setting that can usually get good results so it would be a good starting point for you to try). To reproduce the results, please follow our appendix D.3 for the complete grid search.

Sep 10 '23 14:09 gaotianyu1350

MeZO MeZO copied to clipboard

Cannot reproduce the results for RoBERTa on SST-2

MeZO
MeZO copied to clipboard