inference
inference copied to clipboard
[Llama3] Error when multiple GPUs are used
The following issues appear when running the LLM reference implementation. Multiple GPUs issue:
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] File "/home/zhihanj/.local/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] File "/home/zhihanj/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 166, in init_device
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] File "/home/zhihanj/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 420, in set_device
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] File "/home/zhihanj/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 300, in _lazy_init
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] raise RuntimeError(
(VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method