aphrodite-engine [Usage]: start aphrodite in docker with tensor parallel

Your current environment

I have server with 4x3090ti. I can run llama 3 70b with vllm in docker with command: sudo docker run --shm-size=32g --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=0,1,2,3"' -p 9000:8000 --mount type=bind,source=/home/me/.cache,target=/root/.cache vllm/vllm-openai:v0.5.3.post1 --model casperhansen/llama-3-70b-instruct-awq --tensor-parallel-size 4 --dtype half --gpu-memory-utilization 0.92 -q awq

I tried multiple attempts to start aphrodite in docker with tensor-parallel. Non-standard argument names and insufficient documentation lead to errors and strange behavior. Please add an example of how to run aphrodite with llama 3 70b model and with exl2 quantization on 4 gpus.

How would you like to use Aphrodite?

I want to run this bullerwins/Meta-Llama-3.1-70B-Instruct-exl2_6.0bpw. I don't know how to integrate it with Aphrodite.

Jul 31 '24 20:07 kulievvitaly

We don't directly take command-line arguments in the docker launch command. You will have to supply them as environment variables - please see the .env file in the docker directory for examples. Multi-GPU is also included.

But you're right, our docker documentation is very much lacking. Next update has some docker overhauls. I will make sure to update the wiki.

Aug 01 '24 02:08 AlpinDale

v0.6.0 has changed the docker to take arguments directly as CLI args. Please see the docker section in the documentation, or the snippet in the readme.

Sep 03 '24 13:09 AlpinDale