vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Bug]: openapi running but "POST /v1/chat/completions HTTP/1.1" 404 Not Found

Open yebangyu opened this issue 9 months ago • 5 comments

Your current environment

ubuntu + k8s + vllm/vllm-openai:latest

🐛 Describe the bug

I keep getting the error message: "POST /v1/chat/completions HTTP/1.1" 404 Not Found

The command I use is:

  containers:
    - command: [ "python3", "-m", "vllm.entrypoints.openai.api_server" ]
      args: [ "--model=/share_nfs/hf_models/llama2-70b-chat-hf",
              "--chat-template=/share_nfs/hf_models/llama2-70b-chat-hf/llama-2-chat.jinja",
              "--gpu-memory-utilization=0.9",
              "--disable-log-requests",
              "--trust-remote-code",
              "--port=8000",
              "--max-model-len=4096",
              "--max-num-seqs=512",
              "--max-seq_len-to-capture=4096",
              "--tensor-parallel-size=8" ]

and yes, I have already set the right entry point vllm.entrypoints.openai.api_server and yes, I have already set the chat template which is downloaded from here:

https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/llama-2-chat.jinja

no errors during startup but I get 404 for /v1/chat/completions

BTW, /v1/models and /v1/completion are both ok.

What am I missing ? thanks!

yebangyu avatar May 07 '24 08:05 yebangyu

Can you copy the logs of the OpenAI-compatible server? Make sure you're using the correct address/port.

DarkLight1337 avatar May 07 '24 08:05 DarkLight1337

Why are you using a custom chat template when that model comes with one?

https://huggingface.co/meta-llama/Llama-2-70b-chat-hf/blob/main/tokenizer_config.json#L12

hmellor avatar May 07 '24 10:05 hmellor

Why are you using a custom chat template when that model comes with one?

https://huggingface.co/meta-llama/Llama-2-70b-chat-hf/blob/main/tokenizer_config.json#L12

@hmellor thanks a lot. so I do not need to set chat-template at all ?

yebangyu avatar May 09 '24 06:05 yebangyu

I don't think so. If there is already one in the tokenizer config for the model you're using, it should use that.

hmellor avatar May 09 '24 07:05 hmellor

I had similar problem. The problem was that I have deployed different model using vllm then I set in openai client.

kbzowski avatar May 15 '24 09:05 kbzowski