vllm
vllm copied to clipboard
[Bug]: openapi running but "POST /v1/chat/completions HTTP/1.1" 404 Not Found
Your current environment
ubuntu + k8s + vllm/vllm-openai:latest
🐛 Describe the bug
I keep getting the error message: "POST /v1/chat/completions HTTP/1.1" 404 Not Found
The command I use is:
containers:
- command: [ "python3", "-m", "vllm.entrypoints.openai.api_server" ]
args: [ "--model=/share_nfs/hf_models/llama2-70b-chat-hf",
"--chat-template=/share_nfs/hf_models/llama2-70b-chat-hf/llama-2-chat.jinja",
"--gpu-memory-utilization=0.9",
"--disable-log-requests",
"--trust-remote-code",
"--port=8000",
"--max-model-len=4096",
"--max-num-seqs=512",
"--max-seq_len-to-capture=4096",
"--tensor-parallel-size=8" ]
and yes, I have already set the right entry point vllm.entrypoints.openai.api_server and yes, I have already set the chat template which is downloaded from here:
https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/llama-2-chat.jinja
no errors during startup but I get 404 for /v1/chat/completions
BTW, /v1/models and /v1/completion are both ok.
What am I missing ? thanks!
Can you copy the logs of the OpenAI-compatible server? Make sure you're using the correct address/port.
Why are you using a custom chat template when that model comes with one?
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf/blob/main/tokenizer_config.json#L12
Why are you using a custom chat template when that model comes with one?
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf/blob/main/tokenizer_config.json#L12
@hmellor thanks a lot. so I do not need to set chat-template at all ?
I don't think so. If there is already one in the tokenizer config for the model you're using, it should use that.
I had similar problem. The problem was that I have deployed different model using vllm then I set in openai client.