InternVL
InternVL copied to clipboard
[Bug] Template inconsistency causes different results between vLLM and lmdeploy when using InternVL2
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
When running inference with InternVL2, I found that vLLM and lmdeploy produce inconsistent results. After investigation, it seems that the issue is caused by a template mismatch between the two frameworks.
lmdeploy uses its built-in prompt template.
vLLM relies on the chat_template parameter defined in the configuration file.
In testing, I noticed that vLLM’s template does not include the following prompt section:
<|im_start|>system
你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫 InternVL,是一个有用无害的人工智能助手。<|im_end|>
Reproduction
vllm:
vllm serve ${MODEL_PATH}
--enforce-eager
--trust-remote-code
--gpu-memory-utilization 0.6
--port 8000 \
lmdeploy:
lmdeploy serve api_server ${MODEL_PATH}
--model-name ${MODEL_NAME}
--server-port 8000
--tp 1 \
Environment
GPU H20
vllm:0.6.3
lmdeploy:0.9.2
model:OpenGVLab/InternVL2-2B
Error traceback