LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

How to use BLIP2 finetuned model

Open luozhiping opened this issue 1 year ago • 2 comments

I train a finetune model use command: python train.py --cfg-path lavis/projects/blip2/train/pretrain_stage2.yaml my env is image

but when i use finetuned model to generate caption, the error happend RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

my code is :

import torch
from omegaconf import OmegaConf
from lavis.common.registry import registry
from lavis.models import load_preprocess
from PIL import Image
import requests
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
model_cls = registry.get_model_class("blip2_opt")
model = model_cls(img_size=224,vit_precision="fp32",freeze_vit=True)
model.load_checkpoint("/root/luo6/LAVIS/lavis/output/BLIP2/Pretrain_stage2/20230402224/checkpoint_9.pth")
model.eval()
cfg = OmegaConf.load(model_cls.default_config_path("pretrain_opt2.7b"))
preprocess_cfg = cfg.preprocess
vis_processors, txt_processors = load_preprocess(preprocess_cfg)
model.to(device)
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
raw_image = raw_image.resize((224,224))
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

model.generate({"image": image})

my checkpoint file is like this: image

how to use finetuned model?

luozhiping avatar Apr 04 '23 06:04 luozhiping

What is your transformer version?

dxli94 avatar Apr 12 '23 12:04 dxli94

I train a finetune model use command: python train.py --cfg-path lavis/projects/blip2/train/pretrain_stage2.yaml my env is image

but when i use finetuned model to generate caption, the error happend RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

my code is :

import torch
from omegaconf import OmegaConf
from lavis.common.registry import registry
from lavis.models import load_preprocess
from PIL import Image
import requests
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
model_cls = registry.get_model_class("blip2_opt")
model = model_cls(img_size=224,vit_precision="fp32",freeze_vit=True)
model.load_checkpoint("/root/luo6/LAVIS/lavis/output/BLIP2/Pretrain_stage2/20230402224/checkpoint_9.pth")
model.eval()
cfg = OmegaConf.load(model_cls.default_config_path("pretrain_opt2.7b"))
preprocess_cfg = cfg.preprocess
vis_processors, txt_processors = load_preprocess(preprocess_cfg)
model.to(device)
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
raw_image = raw_image.resize((224,224))
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

model.generate({"image": image})

my checkpoint file is like this: image

how to use finetuned model?

Hi, Have you solved this problem? Could you let me know how to solve it ?

Sukeysun avatar Apr 27 '23 09:04 Sukeysun

The latest release as of now is v1.0.2 from March 6th.

If you're using that, then you need to have transformers version between 4.25.0 and 4.26.1, as specified here: https://github.com/salesforce/LAVIS/blob/7aa83e93003dade66f7f7eaba253b10c459b012d/requirements.txt#L26

If you're using a newer version of transformers, then you need a version of LAVIS that includes this commit.

To install from HEAD you can use:

pip install git+https://github.com/salesforce/LAVIS.git

denis-ismailaj avatar Jun 11 '23 18:06 denis-ismailaj