llama-stack
llama-stack copied to clipboard
pytorch CUDA not found in host that has CUDA with working pytorch
I am getting this error.
ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py", line 285, in launch_dist_group
elastic_launch(launch_config, entrypoint=worker_process_entrypoint)(
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
backend_class = ProcessGroupNCCL(
File "/usr/local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1594, in _new_process_group_helper
default_pg, _ = _new_process_group_helper(
File "/usr/local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1368, in init_process_group
func_return = func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper
torch.distributed.init_process_group("nccl")
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/generation.py", line 83, in build
llama = Llama.build(config)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/model_parallel.py", line 39, in init_model_cb
model = init_model_cb()
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py", line 240, in worker_process_entrypoint
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
traceback : Traceback (most recent call last):
error_file: /tmp/torchelastic_5pr8utde/018126eb-03bf-42ad-add7-00c1e0e4ec6a_dp_sgnfv/attempt_0/0/error.json
exitcode : 1 (pid: 18)
rank : 0 (local_rank: 0)
host : llama-stack-llama3-2-11b-vision-54cf7f9bfd-rz58g
time : 2024-10-16_10:06:03
[0]:
Root Cause (first observed failure):
------------------------------------------------------------
<NO_OTHER_FAILURES>
Failures:
------------------------------------------------------------
worker_process_entrypoint FAILED
============================================================
Context
I built image with llama-stack like this
- clone repo master
- add docker command
--platform linux/amd64 - build llama-stack into venv
./env/bin/llama stack build --template local --image-type docker --name llama-stack
CUDA environment
I confirmed that there is CUDA drivers with test CUDA images.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 41C P8 17W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
and sample CUDA Pod works too
$ kubectl -n ml logs vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
pods
apiVersion: v1
kind: Pod
metadata:
name: cuda-info
namespace: ml
spec:
restartPolicy: OnFailure
containers:
- name: main
image: cuda:12.4.1-cudnn-devel-ubuntu22.04
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1
apiVersion: v1
kind: Pod
metadata:
name: vector-add
namespace: ml
spec:
restartPolicy: OnFailure
containers:
- name: main
image: cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04
resources:
limits:
nvidia.com/gpu: 1
CUDA + Pytorch
I confirmed it works on this host.
apiVersion: v1
kind: Pod
metadata:
name: pytorch-cuda
namespace: ml
spec:
containers:
- name: main
image: pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel
command: ["/bin/sh", "-c", "sleep 1000000"]
resources:
limits:
nvidia.com/gpu: 1
$ kubectl exec -n ml --stdin --tty pytorch-cuda -- /bin/bash
root@pytorch-cuda:/workspace# python3
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.current_device()
0
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'NVIDIA L4'
>>>
root@pytorch-cuda:/workspace#
I suspect something is wrong with Docker images that llama stack is building, perhaps it does not include CUDA by default?
Our llamastack-local-gpu docker images comes with CUDA, while llamastack-local-cpu do not come with CUDA. What command are you using to starting up the llama stack distribution? You may need to add the --gpus=all flag.
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack-local-gpu
how I start
I am starting container in K8S Pod without command nor arguments. I am relying on default entrypoint that is in the container.
llamastack-local-gpu
I don't think llamastack-local-gpu is used inside image that is being built with ./env/bin/llama stack build --template local --image-type docker --name llama-stack. inspecting Docker commands inside Docker image build with llama stack does not show any references to CUDA nor NVIDIA in layers nor commands. so far it appears to me that llama stack builds image with no GPU support.
huggingface pytorch image like this does work with CUDA. so the host is ok, model is ok. pytorch+cuda is ok. clearly something is wrong with llama stack images.
here is working CUDA + Pytorch + Llama 3.2 11B Vision that works: https://github.com/nikolaydubina/basic-openai-pytorch-server
btw, it is just 3 files and 100 lines of code
This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!