LAVIS How to do few-shot in-context learning with BLIP2/InstructBLIP?

How to do few-shot in-context learning with BLIP2/InstructBLIP?

Open ys-zong opened this issue 1 year ago • 3 comments

Hi, thanks for the great work!

I wonder how can I prompt BLIP2 and InstructBLIP to do few-shot in-context learning, e.g. few-shot VQA. Specifically, I want to have the input like [Img] [QA1] [Img] [QA2] ... [Img][Qn] --> Answer.

I saw this issue #433 about how to prompt with <Img, Q1, A1, Q2, ?>. So the difference here is how can I input multiple images? Many thanks!

Sep 15 '23 21:09 ys-zong

I'm also wondering if we can feed multiple images

Oct 20 '23 00:10 jeeyung

Have you found a method? I also want to do a few shot experiment.

Mar 29 '24 08:03 Fym68

FYI. I didn't find a neat way for few-shot BLIP, but I implemented the few-shot inference of many other V-L models here: https://github.com/ys-zong/VL-ICL

Mar 29 '24 08:03 ys-zong

LAVIS LAVIS copied to clipboard

How to do few-shot in-context learning with BLIP2/InstructBLIP?

LAVIS
LAVIS copied to clipboard