LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

CUDAerror: batch inference instructblip

Open zhangqingwu opened this issue 1 year ago • 15 comments

When batch=1, it can reason normally `` model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True,device=device) test_dataset = DatasetInstructBILPImage(transformer=vis_processors, pkl_label_file=pkl_label) test_dataloader = DataLoader(test_dataset, batch_size=1, num_workers=0) prompt = "Write a short description for the image." with torch.no_grad(): for sample in test_dataloader: image = sample["image"].cuda() # to(device, torch.float16) text_output = model.generate({"image": image, "prompt": [prompt]*image.size()[0]})

When the batchsize is set to 2, frist batch can be inference normally, and then this problem is encountered test_dataloader = DataLoader(test_dataset, batch_size=2, num_workers=0)

zhangqingwu avatar May 25 '23 13:05 zhangqingwu

Same problem, can not inference where batch size greater than 1.

Scarecrow0 avatar Jun 01 '23 02:06 Scarecrow0

Same problem

24-solar-terms avatar Jun 07 '23 07:06 24-solar-terms

I cannot reproduce this error. May I know your transformers version?

LiJunnan1992 avatar Jun 09 '23 00:06 LiJunnan1992

@LiJunnan1992 I use transformers==4.29.1

24-solar-terms avatar Jun 09 '23 01:06 24-solar-terms

I still cannot reproduce this error. I can successfully run batch inference with the following code.

import torch
from PIL import Image
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

import torch
from lavis.models import load_model_and_preprocess
model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True, device=device)

img_path = "docs/_static/merlion.png"
raw_image = Image.open(img_path).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
image = torch.cat([image,image],dim=0)
prompt = ["Describe the image in details.","Which city is this?"]
model.generate({"image": image,"prompt":prompt})

LiJunnan1992 avatar Jun 09 '23 07:06 LiJunnan1992

add outputs[outputs == -1] = 1 to https://github.com/salesforce/LAVIS/blob/59273f651b9bffb193d1b12a235e909e9f826dda/lavis/models/blip2_models/blip2_vicuna_instruct.py#L372 You can have a try

zhangqingwu avatar Jun 09 '23 14:06 zhangqingwu

I think it may be caused by a torch.DDP, I adpot the training and evaluation loop of blip2-opt for instruct-blip vicuna and cause this error. @LiJunnan1992 Would you have plan to release implementation of train/evalution loop for instruct-blip? It will help alot, thanks!

Scarecrow0 avatar Jun 12 '23 03:06 Scarecrow0

add outputs[outputs == -1] = 1 to

https://github.com/salesforce/LAVIS/blob/59273f651b9bffb193d1b12a235e909e9f826dda/lavis/models/blip2_models/blip2_vicuna_instruct.py#L372

You can have a try

Does this modification will work? The error is occur within llm_model.generate()

Scarecrow0 avatar Jun 12 '23 03:06 Scarecrow0

I still cannot reproduce this error. I can successfully run batch inference with the following code.

import torch
from PIL import Image
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

import torch
from lavis.models import load_model_and_preprocess
model, vis_processors, _ = load_model_and_preprocess(name="blip2_vicuna_instruct", model_type="vicuna7b", is_eval=True, device=device)

img_path = "docs/_static/merlion.png"
raw_image = Image.open(img_path).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
image = torch.cat([image,image],dim=0)
prompt = ["Describe the image in details.","Which city is this?"]
model.generate({"image": image,"prompt":prompt})

I am getting the same error for this code too, but as soon as I change to FlanT5, the error disappears. I'm pretty sure this has something to do with the Vicuna7b's generate function

STK101 avatar Jun 23 '23 05:06 STK101

Same problem, I cannot run inferences when batch_size_eval > 1. Do you resolve this issue?

ustcwhy avatar Jul 09 '23 07:07 ustcwhy

Same problem, I cannot run inferences when batch_size_eval > 1. Do you resolve this issue?

Vicuna did not work for me, I just used FlanT5 instead just change the LLM you're using. model, vis_processors, _ = load_model_and_preprocess(name="blip2_t5_instruct", model_type="flant5xl", is_eval=True, device=device)

STK101 avatar Jul 10 '23 06:07 STK101

I've encountered the same issue as well. Is there a solution for it?

Zhiyuan-Fan avatar Aug 01 '23 15:08 Zhiyuan-Fan

Is there a solution for it? when batch is more than 1 run the code below got error

def generate(self, images, questions, ocr_tokens=None):
        processed_images = [Image.open(img_path).convert("RGB") for img_path in images]
        
        prompts = []
        for i in range(len(questions)):
            token = ocr_tokens[i] if ocr_tokens and ocr_tokens[i] is not None else ''
            prompt = f"<Image> OCR tokens: {token}. Question: {questions[i]} Short answer:"
            prompts.append(prompt)
        inputs = self.processor(images=processed_images, text=prompts, return_tensors="pt", padding='longest', truncation=True).to(self.device)
 with torch.no_grad():
        generated_texts = self.model.generate(**inputs,
                                                do_sample=False,
                                                num_beams=1,
                                                max_length=256,
                                                min_length=1,
                                                top_p=0.9,
                                                repetition_penalty=1.5,
                                                length_penalty=1.0,
                                                temperature=1)

Cuiunbo avatar Sep 22 '23 13:09 Cuiunbo

Modify the config.json file of the viccuna model, and change "pad_token_id=-1" to "pad_token_id=2" in "config.json". In "config.json" change "pad_token_id=-1" to "pad_token_id=2". This happens because during batch generation, the model sometimes generates pad_token_id=-1

https://github.com/huggingface/transformers/issues/22546#issuecomment-1561257076

zzzzzero avatar Oct 07 '23 09:10 zzzzzero

Modify the config.json file of the viccuna model, and change "pad_token_id=-1" to "pad_token_id=2" in "config.json". In "config.json" change "pad_token_id=-1" to "pad_token_id=2". This happens because during batch generation, the model sometimes generates pad_token_id=-1

huggingface/transformers#22546 (comment)

This seems unrelated but it actually solves the issue

kochsebastian avatar Nov 09 '23 16:11 kochsebastian