server
server copied to clipboard
repeated answer:When I use vllm with Qwen-7b-chat the generated text is x lnot end until the maength, with the repeated answer
sampling_parameters = { "temperature": "0", "top_p": "0.5", "max_tokens": "300"}
python3 client.py