MeZO
MeZO copied to clipboard
Standard FT does not work
Hi, thank for the great work!
When I tried to run your baseline evaluation script with:
TASK=SST-2 K=16 SEED=42 BS=8 LR=1e-5 MODEL=roberta-large bash finetune.sh
the script will break during evaluation with this error message: TypeError: repeat(): argument 'repeats' (position 1) must be tuple of ints, but found element of type NoneType at pos 0
Can you check the standard FT script to see if there is any issue?
Hi, are you using multi-gpu setup? Also, can you share your pytorch/transformers versions?
Thanks for reaching back! I am using the single-gpu setup. For the environment setting, I am using torch 2.1.2+cu118 and transformers 4.37.1
Hi, can you try transformers==4.28.1, this is the version of transformers that we used to test the code base.
I had the same issue with transformers==4.44.2. The problem is with get_eval_dataloader function in the Trainer class. The output eval_dataloader has batch_size attribute as None. However, the eval_dataloader.batch_sampler.batch_size has the right batch-size value. I fixed it by modified the batch_size variable to dataloader.batch_sampler.batch_size in prediction_loop() function of the Transformer's Trainer class. I am not sure if there is a better way to fix this bug without modifying the transformers library.