MeZO
MeZO copied to clipboard
Cannot reproduce some results of OPT
Hi,
I have been attempting to run the three training implementations of MeZO on the OPT-13B, as instructed in the Readme file. However, I have noticed significant differences in some results compared to the ones provided in the research paper.
Could you kindly provide more detailed information in particular the training learning rates used? It would greatly help me in reproducing the expected results.
Thank you for your assistance!
Hi,
Would you mind providing more details on this (e.g., the hyperparameters/commands you used, the results you got)? Also, we do run a grid search for the downstream tasks and the hyperparameters are included in our appendix.
For example, when I run "MODEL=Facebook/opt 13b TASK=Copa MODE=lora LR=5e-5 (or 1e-4) EPS=1e-2 bash mezo. sh", I can only get a score of 84 (the corresponding value in the paper is 87). "MODEL=facebook/opt-13b TASK=Copa MODE=prefix LR=1e-3 (or 1e-2) EPS=1e-1 bash mezo.sh", I can get a score of 87 (the corresponding value in the paper is 84). (Perhaps the positions of the two experimental results mentioned above were written incorrectly.) “MODEL=facebook/opt-13b TASK=WSC MODE=ft LR=1e-7 (or 1e-6) EPS=1e-3 bash mezo.sh” → 61.5. The value in the paper is 63.5. “MODEL=facebook/opt-13b TASK=WIC MODE=ft LR=1e-7 (or 1e-6) EPS=1e-3 bash mezo.sh” → 58.8 (or 49.5). The value in the paper is 61.1. “MODEL=facebook/opt-13b TASK=SQuAD MODE=ft LR=1e-7 (or 1e-6) EPS=1e-3 bash mezo.sh” → 82.0(or 80.8). The value in the paper is 84.7. "MODEL=Facebook/opt 13b TASK=Copa MODE=lora LR=5e-5 (or 1e-4) EPS=1e-2 bash mezo. sh" → 26.9(or 1.8). The value in the paper is 31.4.
Hi, thanks for reporting this issue!
By checking our raw experiment log, there was an error in copying the numbers which led to the swapped results of Copa lora/prefix. Sorry for the confusion and we'll fix it on Arxiv!
Regarding WSC/WIC, our experiment logs showed that we got the reported numbers from the following config:
- WSC 1e-7/1e-3
- WIC 1e-6/1e-3
For SQuAD/DROP, we realized that we forgot to report the grid for those two generation tasks, which has a slightly larger grid. Please see the screenshots below for the grids for SQuAD and DROP. Sincerely sorry for our oversight and thanks for reporting this!