InternVL internvl-chatv1.2-plus 多张图片如何传给模型

Apr 25 '24 03:04 shanren521

您好，感谢关注，请问您是想要多图问答还是batch inference呢

Apr 26 '24 17:04 czczup

您好，感谢关注，请问您是想要多图问答还是batch inference呢

您好，请问多图问答的话，数据格式该是怎样的？还是说目前不支持？谢谢。

May 07 '24 09:05 PsyQuant

我想问支不支持多图问答，一个问题包含5张图片发自我的 iPhone在 2024年4月27日，01:28，Zhe Chen @.***> 写道：您好，感谢关注，请问您是想要多图问答还是batch inference呢

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

May 07 '24 17:05 shanren521

1.5支持多图问答，格式见这里的readme: https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

大致是这样：

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = "详细描述这两张图片" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)

question = "这两张图片的相同点和区别分别是什么" # What are the similarities and differences between these two pictures
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
print(question, response)

May 08 '24 16:05 czczup

其实我想实现的是，一个promt中包含5张图片，问题中包含一张图片，模型从ABCD四张图片中选出一个作为答案返回

May 10 '24 02:05 shanren521

其实我想实现的是，一个promt中包含5张图片，问题中包含一张图片，模型从ABCD四张图片中选出一个作为答案返回

您好，目前的internvl 2.0 已经支持多图,欢迎体验。LINK

Jul 16 '24 11:07 G-z-w