LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

issues of the VQA-v2 training details for the BLIP2

Open runzeer opened this issue 1 year ago • 5 comments

Hi, thanks for your excellent work! When I finetune the pretrained model weights on the VQA-v2 dataset, I found an issue. In your paper said, the extracted image features and the input question are concatenated as the input of the Q-Former. image

But, I noticed in your codes the Q-Former word_embedding and position embedding layers are both None. So I wonder how you implemented this part. image Should I remove these three lines?

runzeer avatar Mar 17 '23 09:03 runzeer