InternVL 不使用lmdeploy和swift应该如何进行多图推理

不使用lmdeploy和swift应该如何进行多图推理

Open aabbc-cell opened this issue 9 months ago • 3 comments

使用的代码是[https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5] 中给的代码，并在多张v100上运行InternVL-Chat-V1-5

path = "./InternVL-Chat-V1-5"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map='auto').eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
pixel_values = load_image('xxx.jpg', max_num=6).to(torch.bfloat16).cuda()

generation_config = dict(
    num_beams=1,
    max_new_tokens=512,
    do_sample=False,
)

# single-round single-image conversation
question = "describe this image"
response = model.chat(tokenizer, pixel_values, question, generation_config)

我希望能和[https://lmdeploy.readthedocs.io/zh-cn/latest/inference/vl_pipeline.html#id5]一样进行类似的多图推理，但lmdeploy不支持dp，我应该如何进行多图推理？