vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Usage]: Do I need to specify chat-template for Qwen model?

Open xudong2019 opened this issue 10 months ago • 3 comments

Your current environment

Hi,

I did a full SFT on Qwen 0.5 B model using LLama-Factory, during which I specified the template parameter. I'm a little confused on whether I should to use a template for the qwen model ? I searched on line but found it not mentioned a lot on under which circumstance should I use the "chat-template" parameter ?

Can anyone give me some suggestion ? thank you.

xudong2019 avatar Apr 30 '24 06:04 xudong2019

I think you must do it. When I tested Qwen1.5 using VLLM, I found that if I don't specify QW's chat-template, the generated results suck.

jeejeelee avatar Apr 30 '24 06:04 jeejeelee

How

I think you must do it. When I tested Qwen1.5 using VLLM, I found that if I don't specify QW's chat-template, the generated results suck.

Thank you for the answer. May I ask how to specify the template ? I saw there are template_baichuan.jinja templates for baichuan and chatglm, but there is no qwen.

xudong2019 avatar May 06 '24 02:05 xudong2019

You can refer to the chat template at https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat and manually add it.

jeejeelee avatar May 06 '24 03:05 jeejeelee

@jeejeelee

Hey Jee! I added the chat template as you described above. But I noticed the slower inference speed compared to other models I experimented before, like llama2. Do you think that's normal to notice? Each request sent in for generation took 30 second to return, even with max_tokens = 128?

Do you think I have to create a new ticket on this?

rsong0606 avatar Jun 28 '24 17:06 rsong0606

https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html

XavierSpycy avatar Sep 09 '24 07:09 XavierSpycy