[Bug] Template inconsistency causes different results between vLLM and lmdeploy when using InternVL2

Open Sugar-zsg opened this issue 2 months ago • 0 comments

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

When running inference with InternVL2, I found that vLLM and lmdeploy produce inconsistent results. After investigation, it seems that the issue is caused by a template mismatch between the two frameworks.

lmdeploy uses its built-in prompt template.

vLLM relies on the chat_template parameter defined in the configuration file.

In testing, I noticed that vLLM’s template does not include the following prompt section:

<|im_start|>system
你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型，英文名叫 InternVL，是一个有用无害的人工智能助手。<|im_end|>

Reproduction

vllm: vllm serve ${MODEL_PATH}
--enforce-eager
--trust-remote-code
--gpu-memory-utilization 0.6
--port 8000 \

lmdeploy: lmdeploy serve api_server ${MODEL_PATH}
--model-name ${MODEL_NAME}
--server-port 8000
--tp 1 \

Environment

GPU H20
vllm:0.6.3
lmdeploy:0.9.2
model:OpenGVLab/InternVL2-2B

Error traceback

Oct 13 '25 08:10 Sugar-zsg