batch inference, multi image per sample
Hi,
The documentation does not explicit how to perform batch inference with multiple images. The documentation only talk about # batch inference, single image per sample (单图批处理):
# batch inference, single image per sample (单图批处理)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
questions = ['<image>\nDescribe the image in detail.'] * len(num_patches_list)
responses = model.batch_chat(tokenizer, pixel_values,
num_patches_list=num_patches_list,
questions=questions,
generation_config=generation_config)
for question, response in zip(questions, responses):
print(f'User: {question}\nAssistant: {response}')
Is it possible to perform batch inference with multiple images ? If so, how ?
Thanks
We suggest you to use LMDeploy for multi-image batch inference. You can refer to their document.
If you want to infer with transformers backend, you can refer to the following code:
# batch inference, single image per sample (单图批处理)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values3 = load_image('./examples/image3.jpg', max_num=12).to(torch.bfloat16).cuda()
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0) + pixel_values3.size(0)]
pixel_values = torch.cat((pixel_values1, pixel_values2, pixel_values3), dim=0)
questions = ['<image>\nDescribe the image in detail.', '<image>\n<image>\nDescribe the image in detail.']
responses = model.batch_chat(tokenizer, pixel_values,
num_patches_list=num_patches_list,
questions=questions,
generation_config=generation_config)
for question, response in zip(questions, responses):
print(f'User: {question}\nAssistant: {response}')
What error are you getting and how are you loading the model?
Documentation on hugging face may provide more clarity on batch inference when using Transformers
What error are you getting and how are you loading the model?
Documentation on hugging face may provide more clarity on batch inference when using Transformers
batch_chat中的image_token的替换逻辑,如果有多个
queries = []
for idx, num_patches in enumerate(num_patches_list):
question = questions[idx]
if pixel_values is not None and '<image>' not in question:
question = '<image>\n' + question
template = get_conv_template(self.template)
template.system_message = self.system_message
template.append_message(template.roles[0], question)
template.append_message(template.roles[1], None)
query = template.get_prompt()
image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
query = query.replace('<image>', image_tokens, 1)
queries.append(query)
We suggest you to use LMDeploy for multi-image batch inference. You can refer to their document.
If you want to infer with transformers backend, you can refer to the following code:
batch inference, single image per sample (单图批处理)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda() pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda() pixel_values3 = load_image('./examples/image3.jpg', max_num=12).to(torch.bfloat16).cuda() num_patches_list = [pixel_values1.size(0), pixel_values2.size(0) + pixel_values3.size(0)] pixel_values = torch.cat((pixel_values1, pixel_values2, pixel_values3), dim=0)
questions = ['
\nDescribe the image in detail.', ' \n \nDescribe the image in detail.'] responses = model.batch_chat(tokenizer, pixel_values, num_patches_list=num_patches_list, questions=questions, generation_config=generation_config) for question, response in zip(questions, responses): print(f'User: {question}\nAssistant: {response}')
batch_chat() can't replace <image> to IMAGE_TOKENS correctly.