vllm
vllm copied to clipboard

Published 20 hours ago •

Reame
Issues

[Bug]: Repeatedly printing after the conversation ends<| im_end |><| im_start |>

Open huangshengfu opened this issue 10 months ago • 6 comments

Your current environment

 docker run --rm --runtime nvidia --gpus all  --name vllm-qwen72b     -v  ~/.cache/huggingface:/root/.cache/huggingface    \
   -v /data1/Download/models/Qwen-72B-Chat-Int4:/data/shared/Qwen/Qwen-Chat     -p 8901:8000     --ipc=host  \
   vllm/vllm-openai:latest --model /data/shared/Qwen/Qwen-Chat     --max-model-len 6400  --trust-remote-code  --tensor-parallel-size 2  \
   --gpu-memory-utilization 0.9  --served-model-name qwen72b --api-key "xxxx"

🐛 Describe the bug

I encountered an issue while running the model in Docker environment. The model is Qwen-72B and the conversation cannot end properly

Apr 22 '24 03:04 huangshengfu

Same problem when using vllm+chatglm3+oneapi+fastgpt. Not sure what part goes wrong

Apr 27 '24 10:04 lijiajun1997

应该是vllm的问题，目前还没找到解决办法，有办法了麻烦踢我一下

Apr 28 '24 01:04 huangshengfu

me too，看到有个类似的解决办法，但不知道要再vllm中怎么修改：https://zhuanlan.zhihu.com/p/695477673

May 06 '24 08:05 huangdehong

我的问题解决了，我是用的oneapi接入了fastgpt，然后我在fastgpt的配置文件中加上了结束的参数 | im_end |就好了

May 06 '24 13:05 huangshengfu

我的问题解决了，我是用的oneapi接入了fastgpt，然后我在fastgpt的配置文件中加上了结束的参数 | im_end |就好了

求分享

May 06 '24 14:05 lijiajun1997

"defaultConfig":{"stop": "<|im_end|>"}

May 06 '24 14:05 huangshengfu

我是在请求的时候加上停止符的tokenId解决的

May 07 '24 08:05 QuanhuiGuan