Oscar
Oscar copied to clipboard
Cannot replicate VinVL VQA results
I was fine-tuning on VQA using VinVL features using the given scripts. However I am getting 74.82 evaluation accuracy, which is 1.3 lower then the reported one (76.12). It would be helpful if anyone could point me to possible reasons for this. I am training using 4 GPUs and here is my training script.
python oscar/run_vqa.py -j 4 --img_feature_dim 2054 --max_img_seq_length 50 --data_label_type mask --img_feature_type faster_r-cnn --data_dir vinvl/datasets/vqa --model_type bert --model_name_or_path vinvl/model_ckpts/vqa/base/checkpoint-2000000 --task_name vqa_text --do_train --do_lower_case --max_seq_length 128 --per_gpu_eval_batch_size 256 --per_gpu_train_batch_size 32 --learning_rate 5e-05 --num_train_epochs 25 --output_dir results/vqa --label_file vinvl/datasets/vqa/trainval_ans2label.pkl --save_epoch 1 --seed 88 --evaluate_during_training --logging_steps 4000 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss_type bce --img_feat_format pt --classifier linear --cls_hidden_scale 3 --txt_data_dir vinvl/datasets/vqa
Hi,
First, 74.82 is on 2k val set, 76.12 is on (Server) test-std set (we do not have ground truth), which are not comparable ... Second, we release the our full numbers on test-dev and test-std on the leaderboard Server evaluation.
@Lizw14 Hi, may I ask could you please provide mask-rcnn labels of the test-dev and test-std set? Thank you very much!
@Lizw14 Hi, may I ask could you please provide mask-rcnn labels of the test-dev and test-std set? Thank you very much!
hello, did you get these two files? If you generated them yourself, which model did you use? What's the effect?