[Bug]: ModuleNotFoundError: No module named 'ray'
Your current environment
N/A
🐛 Describe the bug
Hello, Running the provided quickstart Docker run command, and getting the following error:
INFO: Multiprocessing frontend to use
ipc:///tmp/3f2ae52b-cfde-4764-ad60-361c1c2ced18 for RPC Path.
INFO: Started engine process with PID 57
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/aphrodite/executor/ray_utils.py", line 13, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 214, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, rpc_path)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 29, in init
self.engine = AsyncAphrodite.from_engine_args(async_engine_args)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 703, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/args_tools.py", line 936, in create_engine_config
parallel_config = ParallelConfig(
File "/usr/local/lib/python3.10/dist-packages/aphrodite/common/config.py", line 963, in init
raise ValueError("Unable to load Ray which is "
ValueError: Unable to load Ray which is required for multi-node inference, please install Ray with pip install ray.
No module named 'ray'^CTraceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/api_server.py", line 802, in
Can you share your Docker command? We should not use Ray unless you launch the engine with --worker-use-ray or --distributed-executor-backend=ray.
Hey, thanks for the quick reply. I've was trying the command from the README.md Docker Additionally, we provide a Docker image for easy deployment. Here's a basic command to get you started:
docker run --runtime nvidia --gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
#--env "CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7"
-p 2242:2242
--ipc=host
alpindale/aphrodite-openai:latest
--model NousResearch/Meta-Llama-3.1-8B-Instruct
--tensor-parallel-size 8
--api-keys "sk-empty"
Can you add --distributed-executor-backend=mp to the launch flags?
By default getting same error
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7" \
-p 2242:2242 \
--ipc=host \
alpindale/aphrodite-openai:latest \
--model NousResearch/Meta-Llama-3.1-8B-Instruct \
--tensor-parallel-size 8 \
--api-keys "sk-empty"
Unable to find image 'alpindale/aphrodite-openai:latest' locally
latest: Pulling from alpindale/aphrodite-openai
3c645031de29: Pull complete
0d6448aff889: Pull complete
0a7674e3e8fe: Pull complete
b71b637b97c5: Pull complete
56dc85502937: Pull complete
c1c890480c74: Pull complete
93929e83ed21: Pull complete
0ead3d2f76c1: Pull complete
60cdee2e316d: Pull complete
518f3d7cac80: Pull complete
336c5995c4b2: Pull complete
Digest: sha256:8bac4170be255c19d29d84ffbdeabdc1b0a09ee511bec7ed0026e349db430357
Status: Downloaded newer image for alpindale/aphrodite-openai:latest
INFO: Multiprocessing frontend to use
ipc:///tmp/535bb624-82bb-42e8-bbc7-5ea63814857e for RPC Path.
INFO: Started engine process with PID 46
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/aphrodite/executor/ray_utils.py", line 13, in <module>
import ray
ModuleNotFoundError: No module named 'ray'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 214, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, rpc_path)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 29, in __init__
self.engine = AsyncAphrodite.from_engine_args(async_engine_args)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 703, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/args_tools.py", line 936, in create_engine_config
parallel_config = ParallelConfig(
File "/usr/local/lib/python3.10/dist-packages/aphrodite/common/config.py", line 963, in __init__
raise ValueError("Unable to load Ray which is "
ValueError: Unable to load Ray which is required for multi-node inference, please install Ray with `pip install ray`.
with --distributed-executor-backend=mp it seems to work, after i set the CUDA_VISIBLE_DEVICES=0 and --tensor-parallel-size 1
docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface --env "CUDA_VISIBLE_DEVICES=0" -p 2242:2242 --ipc=host alpindale/aphrodite-openai:latest --model NousResearch/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 1 --api-keys "sk-empty" --distributed-executor-backend=mp
INFO: Multiprocessing frontend to use
ipc:///tmp/6613166f-863d-42db-98cc-5c78ae5f00a4 for RPC Path.
INFO: Started engine process with PID 44
WARNING: The model has a long context length (131072). This may cause OOM
errors during the initial memory profiling phase, or result in low performance
due to small KV cache space. Consider setting --max-model-len to a smaller
value.
INFO:
--------------------------------------------------------------------------------
-----
INFO: Initializing Aphrodite Engine (v0.6.4.post1 commit 20f11fd0) with the
following config:
INFO: Model = 'NousResearch/Meta-Llama-3.1-8B-Instruct'
INFO: DataType = torch.bfloat16
INFO: Tensor Parallel Size = 1
INFO: Pipeline Parallel Size = 1
INFO: Disable Custom All-Reduce = False
INFO: Context Length = 131072
INFO: Enforce Eager Mode = False
INFO: Prefix Caching = False
INFO: Device = device(type='cuda')
INFO: Guided Decoding Backend =
DecodingConfig(guided_decoding_backend='lm-format-enforcer')
INFO:
--------------------------------------------------------------------------------
-----
WARNING: Reducing Torch parallelism from 12 threads to 1 to avoid unnecessary
CPU contention. Set OMP_NUM_THREADS in the external environment to tune this
value as needed.
INFO: Loading model NousResearch/Meta-Llama-3.1-8B-Instruct...
INFO: Using model weights format ['*.safetensors']
The problem is that the docker example has --tensor-parallel-size 8 (so 8 GPUs from my understanding), and from the docs:
--distributed-executor-backend {ray,mp} Category: Parallel Options Backend to use for distributed serving. When more than 1 GPU is used, will be automatically set to "ray" if installed or "mp" (multiprocessing) otherwise.
But in the code, if ray is not found but multiple GPUs are requested, it throws the ValueError and does not switch to "mp".
One simple "fix" would be to adjust the docker example at the main page to use only one GPU, with the note that ray is required for multiple GPUs. Another one would be to actually revert back to "mp" if ray is not found, as per the docs