InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

不使用lmdeploy和swift应该如何进行多图推理

Open aabbc-cell opened this issue 9 months ago • 3 comments

使用的代码是[https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5] 中给的代码,并在多张v100上运行InternVL-Chat-V1-5

path = "./InternVL-Chat-V1-5"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map='auto').eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
pixel_values = load_image('xxx.jpg', max_num=6).to(torch.bfloat16).cuda()

generation_config = dict(
    num_beams=1,
    max_new_tokens=512,
    do_sample=False,
)

# single-round single-image conversation
question = "describe this image"
response = model.chat(tokenizer, pixel_values, question, generation_config)

我希望能和[https://lmdeploy.readthedocs.io/zh-cn/latest/inference/vl_pipeline.html#id5]一样进行类似的多图推理,但lmdeploy不支持dp,我应该如何进行多图推理?

aabbc-cell avatar May 10 '24 08:05 aabbc-cell

the same question

BeiningWu avatar May 13 '24 12:05 BeiningWu

dp 是啥意思

irexyc avatar May 15 '24 08:05 irexyc

dp 是啥意思

data parallel

aabbc-cell avatar May 20 '24 07:05 aabbc-cell

现在可以按照这个文档使用transformers进行多图推理:

https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html#inference-with-transformers

czczup avatar Jul 30 '24 13:07 czczup