LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

How to handle multiple images with Blip2 models ?

Open rr191211 opened this issue 2 years ago • 1 comments

How to handle multiple images with Blip2 models? I have a large number of questions which require more than one image to answer for VQA task, like 1 questions vs image set. Can I extracting the features from each image in my image set and then concat them as input to the Qformer? Thx.

rr191211 avatar Feb 21 '23 12:02 rr191211

Yes you can precisely do that. The input length for transformer can vary due to the cross-attention mechanism. However, it is suggested to fine-tune the model to adapt to multiple images.

LiJunnan1992 avatar Feb 23 '23 02:02 LiJunnan1992

Hello, Sorry, I know this issue has been closed. However, may I ask how you cope with two or more images as inputs for visual QA with BLIP-2? Any public codes or tips would be appreciated. Thanks.

1TTT9 avatar Aug 01 '24 15:08 1TTT9