LAVIS
LAVIS copied to clipboard
InstructBLIP generates short and repeated sentence.
Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal.
raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB")
question = "What is unusual about this image?"
# use "eval" processors for inference
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
question = txt_processors["eval"](question)
samples = {"image": image, "text_input": question}
display(raw_image.resize((596, 437)))
output = model.generate(samples=samples, use_nucleus_sampling=True, repetition_penalty=2.0, min_length=50)[0]
print(output)
The output is:
man on top of car drying clothes in the middle of the street during rush hour with taxi cabs and other cars driving by it is unusual for a man to be doing laundry in the middle of the street during rush hour while there are taxi cabs and other cars driving by
I also try other images, most outputs are short. How can I reproduce the result as shown in your paper? Thanks in advance :)
What model did you use?
I used vicuna-13b:
model, vis_processors, txt_processors = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna13b", is_eval=True, device=device)
Do you observe similar behavior with other models such as vicuna7b?
Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal.
raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB") question = "What is unusual about this image?" # use "eval" processors for inference image = vis_processors["eval"](raw_image).unsqueeze(0).to(device) question = txt_processors["eval"](question) samples = {"image": image, "text_input": question} display(raw_image.resize((596, 437))) output = model.generate(samples=samples, use_nucleus_sampling=True, repetition_penalty=2.0, min_length=50)[0] print(output)
The output is:
man on top of car drying clothes in the middle of the street during rush hour with taxi cabs and other cars driving by it is unusual for a man to be doing laundry in the middle of the street during rush hour while there are taxi cabs and other cars driving by
I also try other images, most outputs are short. How can I reproduce the result as shown in your paper? Thanks in advance :)
same here~ did you try increasing the repeat penalty?