LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Blip2 vicuna instruct

Open kttian opened this issue 1 year ago • 3 comments

Do you have a train config for blip2 vicuna instruct?

Currently, using a vqa dataset with "blip_question" text processors and a vqa task, I encounter an error at this line (https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip2_models/blip2_vicuna_instruct.py#L195) where 'text_output' does not exist (only 'text_input' does).

Thanks!

kttian avatar May 18 '23 08:05 kttian

I think you should reimplement vqa dataset, make text_output exists in the training sample.

iamwangyabin avatar May 18 '23 12:05 iamwangyabin

That could work - what is the 'text_output' field intended to represent?

But also, I mostly want to replicate the authors' training first! None of the datasets (e.g. okvqa) currently have 'text_output' field?

kttian avatar May 18 '23 16:05 kttian

Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.

LiJunnan1992 avatar May 18 '23 23:05 LiJunnan1992

Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.

So did you just select one of the answers to be the text_output, since vqa has lots of possible answers?

gordonhu608 avatar Jun 03 '23 07:06 gordonhu608

Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.

So did you just select one of the answers to be the text_output, since vqa has lots of possible answers?

So have you experimented with this? What's the format of 'text_output'? Besides, I'm not clear whether 'answers' and 'weights' are still necessary for instructblip. Thank you!

dydxdt avatar Aug 29 '23 03:08 dydxdt

@dydxdt Hi, may I ask if you have experimented with this? I am also confused about the "answers" and "weights" when using blip2 or instructblip. Thanks!

Sammy42779 avatar Oct 25 '23 01:10 Sammy42779