vllm
vllm copied to clipboard
[Usage]: Do I need to specify chat-template for Qwen model?
Your current environment
Hi,
I did a full SFT on Qwen 0.5 B model using LLama-Factory, during which I specified the template parameter. I'm a little confused on whether I should to use a template for the qwen model ? I searched on line but found it not mentioned a lot on under which circumstance should I use the "chat-template" parameter ?
Can anyone give me some suggestion ? thank you.
I think you must do it. When I tested Qwen1.5 using VLLM, I found that if I don't specify QW's chat-template, the generated results suck.
How
I think you must do it. When I tested Qwen1.5 using VLLM, I found that if I don't specify QW's chat-template, the generated results suck.
Thank you for the answer. May I ask how to specify the template ? I saw there are template_baichuan.jinja templates for baichuan and chatglm, but there is no qwen.
You can refer to the chat template at https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat and manually add it.
@jeejeelee
Hey Jee! I added the chat template as you described above. But I noticed the slower inference speed compared to other models I experimented before, like llama2. Do you think that's normal to notice? Each request sent in for generation took 30 second to return, even with max_tokens = 128?
Do you think I have to create a new ticket on this?
https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html