vLLM env

You can try the docker image directly:

# it requires shm > 30% RAM, otherwise it will raise SIGBUS error
docker run --rm -it --entrypoint sh --gpus all --shm-size=80gb -p 8000:8000 kemingy/vllm:latest

Start a LLM service with tensor-parallel:

python -m vllm.entrypoints.openai.api_server \
    --model mosaicml/mpt-30b-chat \
    --tokenizer mosaicml/mpt-30b \
    --tensor-parallel-size 4 \
    --worker-use-ray \
    --host 0.0.0.0 \
    --port 8000

Test

Run a single query with the OpenAI Python client:

python openai_client.py

Build the docker image

docker build -t vllm .

To use a specific vllm version:

docker build --build-arg commit=66c54aa -t vllm .

Dev/Serving with `envd`

envd up -f :build

Serving

envd build -f :serving

vllm-env
vllm-env copied to clipboard

Metadata

vLLM env

Test

Build the docker image

Dev/Serving with `envd`

← Metadata

Owner

Metadata

vllm-env vllm-env copied to clipboard

Metadata

vLLM env

Test

Build the docker image

Dev/Serving with envd

← Metadata

Owner

Metadata

vllm-env
vllm-env copied to clipboard

Dev/Serving with `envd`