LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Regression in the blip_caption / base_coco downloads?

Open dchichkov opened this issue 2 years ago • 3 comments

It looks like blip_caption / base_coco models were updated and no longer compatible with the code? Or is it a bug on my side? I'm trying a cog container (that binds to "salesforce-lavis==1.0.0" "torch==1.13.0") and a new docker build of it no longer works.

Since it was working previously and the only moving part seems to be the model downloads, I'd guess a regression in the blip_caption / base_coco ?

Here's the runtime error: caption = model.generate({"image": processed_image}) File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/blip_models/blip_caption.py", line 188, in generate decoder_out = self.text_decoder.generate_from_encoder( File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/med.py", line 1360, in generate_from_encoder ... File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/med.py", line 219, in forward attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

dchichkov avatar Mar 23 '23 20:03 dchichkov

Which transformers version are you on? Can you update and try again?

dxli94 avatar Mar 24 '23 05:03 dxli94

@dxli94 I can replicate this issue with any transformers version from v4.27.0 and up. The latest release this works with is v4.26.1.

Seems to be the same issue as #227 and https://github.com/salesforce/LAVIS/issues/78#issuecomment-1476022738

Minimal reproducible example:

import requests
from PIL import Image
from lavis.models import load_model_and_preprocess
model, vis_processors, txt_processors = load_model_and_preprocess(
    name="blip_caption", model_type="large_coco", is_eval=True
)
img_url = 'https://huggingface.co/spaces/Salesforce/BLIP2/resolve/main/sunset.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0)
captions = model.generate({"image": image})

denis-ismailaj avatar Apr 02 '23 14:04 denis-ismailaj

@dxli94 I can replicate this issue with any transformers version from v4.27.0 and up. The latest release this works with is v4.26.1.

Seems to be the same issue as #227 and #78 (comment)

Minimal reproducible example:

import requests
from PIL import Image
from lavis.models import load_model_and_preprocess
model, vis_processors, txt_processors = load_model_and_preprocess(
    name="blip_caption", model_type="large_coco", is_eval=True
)
img_url = 'https://huggingface.co/spaces/Salesforce/BLIP2/resolve/main/sunset.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0)
captions = model.generate({"image": image})

how can I use my checkpoint download from the hugging face to caption my image data? if u know, please tell me.Thx

AnonymousDestroyer avatar Apr 05 '23 17:04 AnonymousDestroyer