aphrodite-engine icon indicating copy to clipboard operation
aphrodite-engine copied to clipboard

[Usage]: start aphrodite in docker with tensor parallel

Open kulievvitaly opened this issue 1 year ago • 1 comments

Your current environment

I have server with 4x3090ti. I can run llama 3 70b with vllm in docker with command: sudo docker run --shm-size=32g --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=0,1,2,3"' -p 9000:8000 --mount type=bind,source=/home/me/.cache,target=/root/.cache vllm/vllm-openai:v0.5.3.post1 --model casperhansen/llama-3-70b-instruct-awq --tensor-parallel-size 4 --dtype half --gpu-memory-utilization 0.92 -q awq

I tried multiple attempts to start aphrodite in docker with tensor-parallel. Non-standard argument names and insufficient documentation lead to errors and strange behavior. Please add an example of how to run aphrodite with llama 3 70b model and with exl2 quantization on 4 gpus.

How would you like to use Aphrodite?

I want to run this bullerwins/Meta-Llama-3.1-70B-Instruct-exl2_6.0bpw. I don't know how to integrate it with Aphrodite.

kulievvitaly avatar Jul 31 '24 20:07 kulievvitaly

We don't directly take command-line arguments in the docker launch command. You will have to supply them as environment variables - please see the .env file in the docker directory for examples. Multi-GPU is also included.

But you're right, our docker documentation is very much lacking. Next update has some docker overhauls. I will make sure to update the wiki.

AlpinDale avatar Aug 01 '24 02:08 AlpinDale

v0.6.0 has changed the docker to take arguments directly as CLI args. Please see the docker section in the documentation, or the snippet in the readme.

AlpinDale avatar Sep 03 '24 13:09 AlpinDale