FastChat
FastChat copied to clipboard
Openai_api_server Error: 400 status code (no body)
I spawn an openai compatible server using the following docker-compose:
version: "3"
services:
fastchat-controller:
build:
context: .
dockerfile: Dockerfile
image: fastchat:latest
ports:
- "21001:21001"
entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]
fastchat-model-worker:
build:
context: .
dockerfile: Dockerfile
volumes:
- /home/gianluca/.cache/huggingface:/root/.cache/huggingface
image: fastchat:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
ipc: host
entrypoint: ["python3.9", "-m", "fastchat.serve.vllm_worker", "--model-path", "microsoft/Phi-3-mini-4k-instruct", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002", "--num-gpus","2"]
fastchat-api-server:
build:
context: .
dockerfile: Dockerfile
image: fastchat:latest
ports:
- "8000:8000"
entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]
The container instead is:
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat vllm
RUN pip3 install fschat[model_worker,webui]
Everything works, but when the prompt length is close to 4000 tokens (that is size of the the model's context window).
When I approach the limit I keep getting the following error back: Error: 400 status code (no body)
.
Could someone help me debug the issue? The length of the prompt is still under the length of the context window and the error message is not useful. Is there a debug mode I can use to gather more information on what is happening in the backend?