How to train OFA for VQA in open-ended?
Dear authors: Thanks for the great work! In VQA validation, If I want the model to predict the most likely next token (i.e. generating a token in the answer) from the output logits. And then I append this token to the input and repeat this step until the model predicts ⟨EOS⟩. What could I do to achieve it? Thanks a lot!
And I want to train and validate both in this manner. Thanks for your precious time!
Hi, currently the VQA task code supports beam-search inference during validation and testing (in contrast with all-candidate inference, please refer to readme), but the finetuning objective still must be constrained with a pre-defined candidate answer set stored in trainval_ans2label.pkl file. We are working to add a new config to support unconstrained finetuning (which does not need a pre-defined candidate answer set). The code update is still under testing and will be merged in this week.
@yangapku Hi, any updates on this? Thanks!
Hi, a pull request related to this issue #124 has been proposed recently, which will add a new config to activate unconstrained finetuning. However, we find bugs are still existing in this PR, which will result in zero score during evaluation. We are still working on making it function correctly and will merge it ASAP.
Hi, Thanks for your great job!
Any update on this?
@qyc-98 @RishabhMaheshwary @ilovecv Hi, we have found the bug and fixed it! Now the latest codebase supports open-ended (unconstrained) VQA finetuning and evaluation. Please pull the latest code and refer to PR #124 & run_scripts/vqa/train_vqa_distributed.sh (Line 62-68) on how to activate it!
Hi, are there any performance data for the open-ended VQA fine-tuning?
@leng-yue We have tested open-ended VQA fine-tuning on OFA-base (without using EMA). It achieves 76.4 score on our VQA validation set. This performance can still be improved by using EMA and further hyper-param tuning.
Thanks for your response, the result looks good :)