MeZO issues

Impact of Dropout?

1

Hello and thank you for sharing this interesting approach. I have one question regarding dropout. If I understand the published code correctly, MeZO was tested having dropout deactivated, e.g. lines...

helpmefindaname

Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA)

8

I use the grid research below but couldn't reproduce the result of the paper. (I have update the code for WD and successfully reproduce the result on SST2) `TASK=trec K=512...

Yanjun-Zhao

Add a pip-installable, simple implementation of MeZO (along with a distributed impl. and some tests)

3

Hello there! I was very interested in your work after seeing it a NeurIPS. I'd like to play around with it a bit in the future. In order to do...

lebrice

In which file is the code implemented by the algorithm？

1

In fact, I'm a beginner. So I would like to know in which file the code for the implementation of the algorithm in the paper is in？

1llss

Zero Order implementation does not converge in CIFAR-10 dataset.

1

Hi I have implemented Mezo in cifar-10 dataset but the model does not seems to be converging even after large epochs, by taking appropriate parameters it seems to be going...

amritansh6

Standard FT does not work

3

Hi, thank for the great work! When I tried to run your baseline evaluation script with: ``` TASK=SST-2 K=16 SEED=42 BS=8 LR=1e-5 MODEL=roberta-large bash finetune.sh ``` the script will break...

YaNgZhAnG-V5

Getting a RuntimeError after training with mezo

6

Hello, Thank you for sharing your work! I'm getting the error below after training with the mezo.sh script: RuntimeError: Default process group has not been initialized, please make sure to...

sowmaster

max_seq_length and max_seq_len confusion

1

Hi team! Are these two `max_seq_length` and `max_seq_len` supposed to be the same parameter? https://github.com/princeton-nlp/MeZO/blob/552cb1b710767f9a6e1dc8f9645d7640376f9941/medium_models/run_fewshot.sh#L37 https://github.com/princeton-nlp/MeZO/blob/552cb1b710767f9a6e1dc8f9645d7640376f9941/medium_models/run_fewshot.sh#L91 Only `max_seq_length` is referenced in script. Not sure if its a bug.

davidqqq

How to use MeZO in training a simple CIFAR-10 model

3

Hi, thanks you for sharing such an amazing work. To use MeZO more easily, could you provide a minimum demo to show how can we use MeZO as an optimizer...

Cascol-Chen

Cannot reproduce some results of OPT

3

Hi, I have been attempting to run the three training implementations of MeZO on the OPT-13B, as instructed in the Readme file. However, I have noticed significant differences in some...

WangFei-2019

MeZO
MeZO copied to clipboard

Metadata

Impact of Dropout?

Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA)

Add a pip-installable, simple implementation of MeZO (along with a distributed impl. and some tests)

In which file is the code implemented by the algorithm？

Zero Order implementation does not converge in CIFAR-10 dataset.

Standard FT does not work

Getting a RuntimeError after training with mezo

max_seq_length and max_seq_len confusion

How to use MeZO in training a simple CIFAR-10 model

Cannot reproduce some results of OPT

← Metadata

Owner

Metadata

MeZO MeZO copied to clipboard

Metadata

← Metadata

Owner

Metadata

MeZO
MeZO copied to clipboard