LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Questions to reproduce BLIP 2 examples

Open yonatanbitton opened this issue 1 year ago • 11 comments

Hi. I'm trying to use your colab.

I'm trying the most powerful model (the default in the colab):

model, vis_processors, _ = load_model_and_preprocess(
    name="blip2_t5", model_type="pretrain_flant5xxl", is_eval=True, device=device
)

I'm trying to reproduce two examples in the question answering way you have in the colab:

ans = model.generate({"image": image, "prompt": f"Question: {question} Answer:"})

and I also try just feeding the prompt as is:

ans = model.generate({"image": image, "prompt": f"{question}"})

I'm trying to reproduce this example: image

I receive the following output: image

  • I receive only pepperoni, and not the other ingredients

and for this one: image That's the output I receive: image

  • I don't receive any explanation beyond yes. The paper figure shows "it's a house that looks like it's upside down"

How can I receive the behavior described in the paper?

Thanks

yonatanbitton avatar Feb 02 '23 21:02 yonatanbitton