LAVIS
LAVIS copied to clipboard
How to use BLIP2 finetuned model
I train a finetune model use command:
python train.py --cfg-path lavis/projects/blip2/train/pretrain_stage2.yaml
my env is
but when i use finetuned model to generate caption, the error happend
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
my code is :
import torch
from omegaconf import OmegaConf
from lavis.common.registry import registry
from lavis.models import load_preprocess
from PIL import Image
import requests
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
model_cls = registry.get_model_class("blip2_opt")
model = model_cls(img_size=224,vit_precision="fp32",freeze_vit=True)
model.load_checkpoint("/root/luo6/LAVIS/lavis/output/BLIP2/Pretrain_stage2/20230402224/checkpoint_9.pth")
model.eval()
cfg = OmegaConf.load(model_cls.default_config_path("pretrain_opt2.7b"))
preprocess_cfg = cfg.preprocess
vis_processors, txt_processors = load_preprocess(preprocess_cfg)
model.to(device)
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
raw_image = raw_image.resize((224,224))
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
model.generate({"image": image})
my checkpoint file is like this:
how to use finetuned model?
What is your transformer version?
I train a finetune model use command:
python train.py --cfg-path lavis/projects/blip2/train/pretrain_stage2.yaml
my env isbut when i use finetuned model to generate caption, the error happend
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
my code is :
import torch from omegaconf import OmegaConf from lavis.common.registry import registry from lavis.models import load_preprocess from PIL import Image import requests device = torch.device("cuda") if torch.cuda.is_available() else "cpu" model_cls = registry.get_model_class("blip2_opt") model = model_cls(img_size=224,vit_precision="fp32",freeze_vit=True) model.load_checkpoint("/root/luo6/LAVIS/lavis/output/BLIP2/Pretrain_stage2/20230402224/checkpoint_9.pth") model.eval() cfg = OmegaConf.load(model_cls.default_config_path("pretrain_opt2.7b")) preprocess_cfg = cfg.preprocess vis_processors, txt_processors = load_preprocess(preprocess_cfg) model.to(device) img_url = 'https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png' raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') raw_image = raw_image.resize((224,224)) image = vis_processors["eval"](raw_image).unsqueeze(0).to(device) model.generate({"image": image})
my checkpoint file is like this:
how to use finetuned model?
Hi, Have you solved this problem? Could you let me know how to solve it ?
The latest release as of now is v1.0.2
from March 6th.
If you're using that, then you need to have transformers
version between 4.25.0
and 4.26.1
, as specified here:
https://github.com/salesforce/LAVIS/blob/7aa83e93003dade66f7f7eaba253b10c459b012d/requirements.txt#L26
If you're using a newer version of transformers
, then you need a version of LAVIS that includes this commit.
To install from HEAD you can use:
pip install git+https://github.com/salesforce/LAVIS.git