LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

InstructBLIP generates short and repeated sentence.

Open Richar-Du opened this issue 1 year ago • 4 comments

Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal.

raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB")
question = "What is unusual about this image?"
# use "eval" processors for inference
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
question = txt_processors["eval"](question)
samples = {"image": image, "text_input": question}
display(raw_image.resize((596, 437)))
output = model.generate(samples=samples, use_nucleus_sampling=True, repetition_penalty=2.0, min_length=50)[0]
print(output)

The output is:

man on top of car drying clothes in the middle of the street during rush hour with taxi cabs and other cars driving by it is unusual for a man to be doing laundry in the middle of the street during rush hour while there are taxi cabs and other cars driving by

I also try other images, most outputs are short. How can I reproduce the result as shown in your paper? Thanks in advance :)

Richar-Du avatar May 21 '23 04:05 Richar-Du

What model did you use?

LiJunnan1992 avatar May 22 '23 00:05 LiJunnan1992

I used vicuna-13b:

model, vis_processors, txt_processors = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna13b", is_eval=True, device=device)

Richar-Du avatar May 22 '23 01:05 Richar-Du

Do you observe similar behavior with other models such as vicuna7b?

LiJunnan1992 avatar May 22 '23 01:05 LiJunnan1992

Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal.

raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB")
question = "What is unusual about this image?"
# use "eval" processors for inference
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
question = txt_processors["eval"](question)
samples = {"image": image, "text_input": question}
display(raw_image.resize((596, 437)))
output = model.generate(samples=samples, use_nucleus_sampling=True, repetition_penalty=2.0, min_length=50)[0]
print(output)

The output is:

man on top of car drying clothes in the middle of the street during rush hour with taxi cabs and other cars driving by it is unusual for a man to be doing laundry in the middle of the street during rush hour while there are taxi cabs and other cars driving by

I also try other images, most outputs are short. How can I reproduce the result as shown in your paper? Thanks in advance :)

same here~ did you try increasing the repeat penalty?

ldfandian avatar May 25 '23 05:05 ldfandian