InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

internvl-chatv1.2-plus 多张图片如何传给模型

Open shanren521 opened this issue 1 year ago • 5 comments

shanren521 avatar Apr 25 '24 03:04 shanren521

您好,感谢关注,请问您是想要多图问答还是batch inference呢

czczup avatar Apr 26 '24 17:04 czczup

您好,感谢关注,请问您是想要多图问答还是batch inference呢

您好,请问多图问答的话,数据格式该是怎样的?还是说目前不支持?谢谢。

PsyQuant avatar May 07 '24 09:05 PsyQuant

我想问支不支持多图问答,一个问题包含5张图片发自我的 iPhone在 2024年4月27日,01:28,Zhe Chen @.***> 写道: 您好,感谢关注,请问您是想要多图问答还是batch inference呢

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

shanren521 avatar May 07 '24 17:05 shanren521

1.5支持多图问答,格式见这里的readme: https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

大致是这样:

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = "详细描述这两张图片" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)

question = "这两张图片的相同点和区别分别是什么" # What are the similarities and differences between these two pictures
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
print(question, response)

czczup avatar May 08 '24 16:05 czczup

其实我想实现的是,一个promt中包含5张图片,问题中包含一张图片,模型从ABCD四张图片中选出一个作为答案返回

shanren521 avatar May 10 '24 02:05 shanren521

其实我想实现的是,一个promt中包含5张图片,问题中包含一张图片,模型从ABCD四张图片中选出一个作为答案返回

您好,目前的internvl 2.0 已经支持多图,欢迎体验。LINK

G-z-w avatar Jul 16 '24 11:07 G-z-w