FastChat
FastChat copied to clipboard
Openai_api_server Error: 400 status code (no body)
trafficstars
I spawn an openai compatible server using the following docker-compose:
version: "3"
services:
fastchat-controller:
build:
context: .
dockerfile: Dockerfile
image: fastchat:latest
ports:
- "21001:21001"
entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]
fastchat-model-worker:
build:
context: .
dockerfile: Dockerfile
volumes:
- /home/gianluca/.cache/huggingface:/root/.cache/huggingface
image: fastchat:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
ipc: host
entrypoint: ["python3.9", "-m", "fastchat.serve.vllm_worker", "--model-path", "microsoft/Phi-3-mini-4k-instruct", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002", "--num-gpus","2"]
fastchat-api-server:
build:
context: .
dockerfile: Dockerfile
image: fastchat:latest
ports:
- "8000:8000"
entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]
The container instead is:
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat vllm
RUN pip3 install fschat[model_worker,webui]
Everything works, but when the prompt length is close to 4000 tokens (that is size of the the model's context window).
When I approach the limit I keep getting the following error back: Error: 400 status code (no body).
Could someone help me debug the issue? The length of the prompt is still under the length of the context window and the error message is not useful. Is there a debug mode I can use to gather more information on what is happening in the backend?