MeZO
MeZO copied to clipboard
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Hello and thank you for sharing this interesting approach. I have one question regarding dropout. If I understand the published code correctly, MeZO was tested having dropout deactivated, e.g. lines...
I use the grid research below but couldn't reproduce the result of the paper. (I have update the code for WD and successfully reproduce the result on SST2) `TASK=trec K=512...
Add a pip-installable, simple implementation of MeZO (along with a distributed impl. and some tests)
Hello there! I was very interested in your work after seeing it a NeurIPS. I'd like to play around with it a bit in the future. In order to do...
In fact, I'm a beginner. So I would like to know in which file the code for the implementation of the algorithm in the paper is in?
Hi I have implemented Mezo in cifar-10 dataset but the model does not seems to be converging even after large epochs, by taking appropriate parameters it seems to be going...
Hi, thank for the great work! When I tried to run your baseline evaluation script with: ``` TASK=SST-2 K=16 SEED=42 BS=8 LR=1e-5 MODEL=roberta-large bash finetune.sh ``` the script will break...
Hello, Thank you for sharing your work! I'm getting the error below after training with the mezo.sh script: RuntimeError: Default process group has not been initialized, please make sure to...
Hi team! Are these two `max_seq_length` and `max_seq_len` supposed to be the same parameter? https://github.com/princeton-nlp/MeZO/blob/552cb1b710767f9a6e1dc8f9645d7640376f9941/medium_models/run_fewshot.sh#L37 https://github.com/princeton-nlp/MeZO/blob/552cb1b710767f9a6e1dc8f9645d7640376f9941/medium_models/run_fewshot.sh#L91 Only `max_seq_length` is referenced in script. Not sure if its a bug.
Hi, thanks you for sharing such an amazing work. To use MeZO more easily, could you provide a minimum demo to show how can we use MeZO as an optimizer...
Hi, I have been attempting to run the three training implementations of MeZO on the OPT-13B, as instructed in the Readme file. However, I have noticed significant differences in some...