FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Openai_api_server Error: 400 status code (no body)

Open GianlucaDeStefano opened this issue 7 months ago • 0 comments

I spawn an openai compatible server using the following docker-compose:

version: "3"
services:
  fastchat-controller:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "21001:21001"
    entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]
  fastchat-model-worker:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - /home/gianluca/.cache/huggingface:/root/.cache/huggingface
    image: fastchat:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]
    ipc: host
    entrypoint: ["python3.9", "-m", "fastchat.serve.vllm_worker", "--model-path", "microsoft/Phi-3-mini-4k-instruct", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002", "--num-gpus","2"]
  fastchat-api-server:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "8000:8000"
    entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]

The container instead is:

  FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04

RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat vllm
RUN pip3 install fschat[model_worker,webui]

Everything works, but when the prompt length is close to 4000 tokens (that is size of the the model's context window). When I approach the limit I keep getting the following error back: Error: 400 status code (no body). Could someone help me debug the issue? The length of the prompt is still under the length of the context window and the error message is not useful. Is there a debug mode I can use to gather more information on what is happening in the backend?

GianlucaDeStefano avatar Jul 07 '24 23:07 GianlucaDeStefano