[bug] 请问huggingface上chat template写错的问题已经修正了吗
confirmed with internvl team, the chat template in huggingface is wrong, and they will fix it soon.
see https://huggingface.co/OpenGVLab/InternVL3-14B/blob/main/tokenizer_config.json#L271
Originally posted by @youkaichao in #23988
来自两周内vllm的issue,据称跟Intern团队确认这个问题存在,这可能造成20个点的drop
我两个月前从HF下载了InternVL3-14B并使用到现在,我没有感觉到明显异常,并且我本地的template和HF repo里的现在是一样的:
{%- if tools %} {{- '<|im_start|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool | tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} {%- else %} {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<|im_start|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments | tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<|im_end|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<|im_start|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- endif %}
请问这个错误到底存不存在,如果存在现在已经修正了吗,我没看到HF repo有新commit
Thank you for your interest in our work. We have updated the tokenizer_config.json file of InternVL2.5 and InternVL3 to fix this issue. See this commit for more details.
Thank you for your interest in our work. We have updated the
tokenizer_config.jsonfile of InternVL2.5 and InternVL3 to fix this issue. See this commit for more details.
Thanks for the update. I see the differences are the system prompt, multimodal tokens, and tool use. I didnt invoke tool use, and I have been using my own system prompt. Also it was working well with vllm adding the multimodal tokens automactically. I didn't note any abnormality in my production. Is this expected? What fatal effects does the old template have?