How can I fine-tune for the image-text retrieval task ?

Open Camellia-tx opened this issue 3 years ago • 0 comments

python oscar/run_retrieval.py
--model_name_or_path vinvl/coco_ir/base/checkpoint-1340000
--do_train
--do_lower_case
--evaluate_during_training
--num_captions_per_img_val 20
--eval_caption_index_file minival_caption_indexs_top20.pt
--per_gpu_train_batch_size 16
--learning_rate 0.00002
--num_train_epochs 30
--weight_decay 0.05
--save_steps 5000
--add_od_labels
--od_label_type vg
--max_seq_length 70
--max_img_seq_length 70
--output_dir output/

I ran the code above with only one GPU, but the result didn't show the convergence. I want to know how to set the epochs, per_gpu_train_batch_size, and the learning rate if I can only use up to three GPUs. Thanks for your help!

Mar 15 '22 12:03 Camellia-tx