MeZO icon indicating copy to clipboard operation
MeZO copied to clipboard

Standard FT does not work

Open YaNgZhAnG-V5 opened this issue 1 year ago • 3 comments

Hi, thank for the great work!

When I tried to run your baseline evaluation script with:

TASK=SST-2 K=16 SEED=42 BS=8 LR=1e-5 MODEL=roberta-large bash finetune.sh

the script will break during evaluation with this error message: TypeError: repeat(): argument 'repeats' (position 1) must be tuple of ints, but found element of type NoneType at pos 0

Can you check the standard FT script to see if there is any issue?

YaNgZhAnG-V5 avatar Jan 25 '24 12:01 YaNgZhAnG-V5

Hi, are you using multi-gpu setup? Also, can you share your pytorch/transformers versions?

gaotianyu1350 avatar Jan 29 '24 12:01 gaotianyu1350

Thanks for reaching back! I am using the single-gpu setup. For the environment setting, I am using torch 2.1.2+cu118 and transformers 4.37.1

YaNgZhAnG-V5 avatar Jan 30 '24 02:01 YaNgZhAnG-V5

Hi, can you try transformers==4.28.1, this is the version of transformers that we used to test the code base.

gaotianyu1350 avatar Feb 06 '24 12:02 gaotianyu1350

I had the same issue with transformers==4.44.2. The problem is with get_eval_dataloader function in the Trainer class. The output eval_dataloader has batch_size attribute as None. However, the eval_dataloader.batch_sampler.batch_size has the right batch-size value. I fixed it by modified the batch_size variable to dataloader.batch_sampler.batch_size in prediction_loop() function of the Transformer's Trainer class. I am not sure if there is a better way to fix this bug without modifying the transformers library.

aparna-aketi avatar Oct 18 '24 21:10 aparna-aketi