LAVIS
LAVIS copied to clipboard
Regression in the blip_caption / base_coco downloads?
It looks like blip_caption / base_coco models were updated and no longer compatible with the code? Or is it a bug on my side? I'm trying a cog container (that binds to "salesforce-lavis==1.0.0" "torch==1.13.0") and a new docker build of it no longer works.
Since it was working previously and the only moving part seems to be the model downloads, I'd guess a regression in the blip_caption / base_coco ?
Here's the runtime error:
caption = model.generate({"image": processed_image}) File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/blip_models/blip_caption.py", line 188, in generate decoder_out = self.text_decoder.generate_from_encoder( File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/med.py", line 1360, in generate_from_encoder ... File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/lavis/models/med.py", line 219, in forward attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
Which transformers version are you on? Can you update and try again?
@dxli94 I can replicate this issue with any transformers version from v4.27.0 and up. The latest release this works with is v4.26.1.
Seems to be the same issue as #227 and https://github.com/salesforce/LAVIS/issues/78#issuecomment-1476022738
Minimal reproducible example:
import requests
from PIL import Image
from lavis.models import load_model_and_preprocess
model, vis_processors, txt_processors = load_model_and_preprocess(
name="blip_caption", model_type="large_coco", is_eval=True
)
img_url = 'https://huggingface.co/spaces/Salesforce/BLIP2/resolve/main/sunset.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0)
captions = model.generate({"image": image})
@dxli94 I can replicate this issue with any transformers version from
v4.27.0and up. The latest release this works with isv4.26.1.Seems to be the same issue as #227 and #78 (comment)
Minimal reproducible example:
import requests from PIL import Image from lavis.models import load_model_and_preprocess model, vis_processors, txt_processors = load_model_and_preprocess( name="blip_caption", model_type="large_coco", is_eval=True ) img_url = 'https://huggingface.co/spaces/Salesforce/BLIP2/resolve/main/sunset.jpg' raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") image = vis_processors["eval"](raw_image).unsqueeze(0) captions = model.generate({"image": image})
how can I use my checkpoint download from the hugging face to caption my image data? if u know, please tell me.Thx