Qwen1.5 [vLLM] Default system prompt consistently prompts Qwen1.5-32B-Chat-GPTQ-Int4 to output "!" token

[vLLM] Default system prompt consistently prompts Qwen1.5-32B-Chat-GPTQ-Int4 to output "!" token

Open garyfanhku opened this issue 3 months ago • 0 comments

Ref #269 #264, the new 32B model outputs "!!!!!!" tokens when deployed on vLLM. However, a slight tweak in the system prompt seem to address the issue. See below:

modified system prompt

 (base) gary.work@Garys-MacBook-Pro my-app % curl http://X/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "Qwen1.5-32B-Chat-GPTQ-Int4",
    "messages": [
    {"role": "system", "content": "You are a helpful AI assistant"},
    {"role": "user", "content": "Tell me something about large language models."}
    ]
    }'

{"id":"cmpl-4ea8d72b527849818685b6471e014d68","object":"chat.completion","created":1712566256,"model":"Qwen1.5-32B-Chat-GPTQ-Int4","choices":[{"index":0,"message":{"role":"assistant","content":"Large language models are a type of artificial intelligence that have been trained on massive amounts of text data, allowing them to generate human-like language and perform a wide range of natural language processing tasks. These models are typically built using deep learning techniques, specifically neural networks with multiple layers, and are often referred to as deep learning language models.\n\nOne of the key features of large language models is their ability to understand context and generate responses that are relevant and coherent. They are trained on a diverse set of text data, including books, articles, websites, and more, which enables them to learn patterns, relationships, and meanings of words and phrases in different contexts.\n\nLarge language models have been used for various applications, such as language translation, summarization, question-answering, chatbots, text generation, sentiment analysis, and more. Some well-known examples of large language models include BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), and T5 (Text-to-Text Transfer Transformer).\n\nThese models have significantly advanced the field of natural language processing by achieving state-of-the-art performance on many tasks. However, they also come with challenges, such as the need for massive computational resources during training, concerns over bias and fairness in their training data, and the potential for生成 (misuse or unintended consequences) when deployed in real-world applications."},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":27,"total_tokens":304,"completion_tokens":277}}%

default system prompt

    "model": "Qwen1.5-32B-Chat-GPTQ-Int4",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Tell me something about large language models."}
    ]
    }'
{"id":"cmpl-6749d49900574349bfcd5eebb4409454","object":"chat.completion","created":1712566278,"model":"Qwen1.5-32B-Chat-GPTQ-Int4","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":26,"total_tokens":8192,"completion_tokens":8166}}

Tested on vLLM 0.4.0 post1 w/ transformers 4.39.3 and vllm 0.3.0 w/ transformers 4.37.2.

@JustinLin610 Never seen this behavior before. Since HF endpoints behaves normally, safe to assume it's vLLM-related? Also would you kindly confirm that 32B model uses the same system prompt as 72B and 14B models?

Thanks!

Apr 08 '24 10:04 garyfanhku

Qwen1.5 Qwen1.5 copied to clipboard

[vLLM] Default system prompt consistently prompts Qwen1.5-32B-Chat-GPTQ-Int4 to output "!" token

modified system prompt

default system prompt

Qwen1.5
Qwen1.5 copied to clipboard