OFA icon indicating copy to clipboard operation
OFA copied to clipboard

How to train OFA for VQA in open-ended?

Open qyc-98 opened this issue 3 years ago • 6 comments

Dear authors: Thanks for the great work! In VQA validation, If I want the model to predict the most likely next token (i.e. generating a token in the answer) from the output logits. And then I append this token to the input and repeat this step until the model predicts ⟨EOS⟩. What could I do to achieve it? Thanks a lot!

qyc-98 avatar Jun 10 '22 16:06 qyc-98

And I want to train and validate both in this manner. Thanks for your precious time!

qyc-98 avatar Jun 11 '22 05:06 qyc-98

Hi, currently the VQA task code supports beam-search inference during validation and testing (in contrast with all-candidate inference, please refer to readme), but the finetuning objective still must be constrained with a pre-defined candidate answer set stored in trainval_ans2label.pkl file. We are working to add a new config to support unconstrained finetuning (which does not need a pre-defined candidate answer set). The code update is still under testing and will be merged in this week.

yangapku avatar Jun 14 '22 08:06 yangapku

@yangapku Hi, any updates on this? Thanks!

ilovecv avatar Jul 07 '22 06:07 ilovecv

Hi, a pull request related to this issue #124 has been proposed recently, which will add a new config to activate unconstrained finetuning. However, we find bugs are still existing in this PR, which will result in zero score during evaluation. We are still working on making it function correctly and will merge it ASAP.

yangapku avatar Jul 22 '22 09:07 yangapku

Hi, Thanks for your great job!

qyc-98 avatar Jul 26 '22 08:07 qyc-98

Any update on this?

RishabhMaheshwary avatar Aug 09 '22 20:08 RishabhMaheshwary

@qyc-98 @RishabhMaheshwary @ilovecv Hi, we have found the bug and fixed it! Now the latest codebase supports open-ended (unconstrained) VQA finetuning and evaluation. Please pull the latest code and refer to PR #124 & run_scripts/vqa/train_vqa_distributed.sh (Line 62-68) on how to activate it!

yangapku avatar Sep 21 '22 06:09 yangapku

Hi, are there any performance data for the open-ended VQA fine-tuning?

leng-yue avatar Sep 21 '22 08:09 leng-yue

@leng-yue We have tested open-ended VQA fine-tuning on OFA-base (without using EMA). It achieves 76.4 score on our VQA validation set. This performance can still be improved by using EMA and further hyper-param tuning.

yangapku avatar Sep 21 '22 09:09 yangapku

Thanks for your response, the result looks good :)

leng-yue avatar Sep 22 '22 00:09 leng-yue