LAVIS
LAVIS copied to clipboard
Blip2 vicuna instruct
Do you have a train config for blip2 vicuna instruct?
Currently, using a vqa dataset with "blip_question" text processors and a vqa task, I encounter an error at this line (https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip2_models/blip2_vicuna_instruct.py#L195) where 'text_output' does not exist (only 'text_input' does).
Thanks!
I think you should reimplement vqa dataset, make text_output exists in the training sample.
That could work - what is the 'text_output' field intended to represent?
But also, I mostly want to replicate the authors' training first! None of the datasets (e.g. okvqa) currently have 'text_output' field?
Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.
Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.
So did you just select one of the answers to be the text_output, since vqa has lots of possible answers?
Thanks for your question. Yes you need to reimplement vqa dataset. It is suggested to write a wrapper class using exiting dataset classes. The "text_input" returns the instruction (e.g. "Question: {question} Answer:"). The "text_output" returns the answer.
So did you just select one of the answers to be the text_output, since vqa has lots of possible answers?
So have you experimented with this? What's the format of 'text_output'? Besides, I'm not clear whether 'answers' and 'weights' are still necessary for instructblip. Thank you!
@dydxdt Hi, may I ask if you have experimented with this? I am also confused about the "answers" and "weights" when using blip2 or instructblip. Thanks!