ollama update to CUDA v12.2 libraries in docker container?

Hi,

I'm deploying ollama in a self-hosted kubernetes cluster using https://github.com/otwld/ollama-helm. However, when the pod starts, it is not able to find the GPUs. I have the following logs:

time=2024-08-15T15:39:41.512Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so*
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcudart.so*
time=2024-08-15T15:39:41.513Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama869795597/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-08-15T15:39:41.514Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/tmp/ollama869795597/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-08-15T15:39:41.515Z level=DEBUG source=gpu.go:537 msg="Unable to load cudart" library=/tmp/ollama869795597/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-08-15T15:39:41.515Z level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu"
time=2024-08-15T15:39:41.515Z level=INFO source=gpu.go:350 msg="no compatible GPUs were discovered"

The info of the nvidia-smi of my system is the following:

NVIDIA-SMI 535.183.01
Driver Version: 535.183.01
CUDA Version: 12.2

The values of the HELM chart I use are:

ollama:
  gpu:
    # -- Enable GPU integration
    enabled: true

    # -- GPU type: 'nvidia' or 'amd'
    # If 'ollama.gpu.enabled', default value is nvidia
    # If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override
    # This is due cause AMD and CPU/CUDA are different images
    type: 'nvidia'

    # -- Specify the number of GPU
    number: 1

    # -- only for nvidia cards; change to (example) 'nvidia.com/mig-1g.10gb' to use MIG slice
    nvidiaResource: "nvidia.com/gpu"
    # nvidiaResource: "nvidia.com/mig-1g.10gb" # example
....
extraEnv: 
 - name: NVIDIA_DRIVER_CAPABILITIES
   value: compute, utility
 - name: NVIDIA_VISIBLE_DEVICES
   value: all
 - name: OLLAMA_DEBUG
   value: "1"

In this issue https://github.com/ollama/ollama/issues/2670 @dhiltgen mention the following: "CUDA v11 libraries are currently embedded within the ollama linux binary and are extracted at runtime". So, my problem might be related to compatibility of CUDA versions. I cannot downgrade the CUDA version of the cluster because other services use the GPUs as well (with CUDA 12.2).

Is a way to solve this? Should I build again the docker image with the right CUDA version? Could you guide me how to do so? Is there plans to update the CUDA version in future releases?

Thanks in advance,

Aug 16 '24 02:08 juancaoviedo

I'm running ollama in a docker container on a system with CUDA 12.2 without a problem. What's the output when you run nvidia-smi inside the ollama container? Is the nvidia runtime configured for kubernetes?

Aug 16 '24 11:08 rick-github

Hi @rick-github, thanks for your answer. If I run the command on the system/cluster, it works and I have results. If I run the nvidia-smi command inside of the ollama container I have the following output:

bash: nvidia-smi: command not found

The Nvidia runtime works for the cluster, because other services use the gpus (specifically, this service: https://github.com/iot-salzburg/gpu-jupyter). The gpu-jupyter service that uses the gpus is able to recognize the nvidia-smi command inside of the container/pod, here you can see the output:

(base) jovyan@jupyter-0000-2d0002-2d1286-2d2869:~$ nvidia-smi
Fri Aug 16 13:14:27 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      Off | 00000000:AF:00.0 Off |                  Off |
| N/A   57C    P0              57W / 250W |    584MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla P40                      Off | 00000000:D8:00.0 Off |                  Off |
| N/A   33C    P8              10W / 250W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

It's only the ollama container/pod that is not able to execute the nvidia-smi command nor find the gpus. I do not think that the problem is the containerd, since the jupyter executor with gpus is working properly. I feel the problem is really in the versions of the CUDA V11 vs V12.2. Therefore, I'm going to try to build a new docker image changing the dockerfile and using the 12.2.2 CUDA version (which exists for devel-centos7 and devel-rockylinux8 images). If that works I'll inform it here.

Thanks,

Aug 16 '24 13:08 juancaoviedo

I don't know anything about kubernetes, but these are my observations with docker.

If I start a new container with the gpu runtime, the required libraries and commands are populated:

$ docker run --rm --gpus all --entrypoint bash ollama/ollama:0.3.5 -c 'ls -l /usr/lib/x86_64-linux-gnu/libcuda* /usr/bin/nvidia-smi'
-rwxr-xr-x 1 root root   644112 Oct 25  2023 /usr/bin/nvidia-smi
lrwxrwxrwx 1 root root       12 Aug 16 14:07 /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root       21 Aug 16 14:07 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.525.147.05
-rw-r--r-- 1 root root 29867944 Oct 25  2023 /usr/lib/x86_64-linux-gnu/libcuda.so.525.147.05
lrwxrwxrwx 1 root root       29 Aug 16 14:07 /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1 -> libcudadebugger.so.525.147.05
-rw-r--r-- 1 root root 10490248 Oct 25  2023 /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.147.05

If I start the container without gpu runtime, these files are not present:

$ docker run --rm  --entrypoint bash ollama/ollama:0.3.5 -c 'ls -l /usr/lib/x86_64-linux-gnu/libcuda* /usr/bin/nvidia-smi'
ls: cannot access '/usr/lib/x86_64-linux-gnu/libcuda*': No such file or directory
ls: cannot access '/usr/bin/nvidia-smi': No such file or directory

The fact that nvidia-smi in not present in your container leads me to believe there is an issue with the runtime for that container.

Aug 16 '24 14:08 rick-github

Hi @rick-github, thanks for your answer. If I run the command on the system/cluster, it works and I have results. If I run the nvidia-smi command inside of the ollama container I have the following output:
bash: nvidia-smi: command not found
The Nvidia runtime works for the cluster, because other services use the gpus (specifically, this service: https://github.com/iot-salzburg/gpu-jupyter). The gpu-jupyter service that uses the gpus is able to recognize the nvidia-smi command inside of the container/pod, here you can see the output:
(base) jovyan@jupyter-0000-2d0002-2d1286-2d2869:~$ nvidia-smi
Fri Aug 16 13:14:27 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      Off | 00000000:AF:00.0 Off |                  Off |
| N/A   57C    P0              57W / 250W |    584MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla P40                      Off | 00000000:D8:00.0 Off |                  Off |
| N/A   33C    P8              10W / 250W |      2MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
It's only the ollama container/pod that is not able to execute the nvidia-smi command nor find the gpus. I do not think that the problem is the containerd, since the jupyter executor with gpus is working properly. I feel the problem is really in the versions of the CUDA V11 vs V12.2. Therefore, I'm going to try to build a new docker image changing the dockerfile and using the 12.2.2 CUDA version (which exists for devel-centos7 and devel-rockylinux8 images). If that works I'll inform it here.

Thanks,

I am using Tesla P40 but I have an intermittent issue. I can run nvidia-smi within the ollama container but it struggles to load the model once it unloads it to save the watts! I ended up giving up on this and installed non docker version.

Aug 16 '24 23:08 fahadshery

@juancaoviedo updating to v12 doesn't sound like it will address your container runtime issue. If you can get nvidia-smi running inside a pod in your cluster, then ollama should work on the GPU.

That said, PR #5049 will provide v12 in the container image.

Aug 18 '24 15:08 dhiltgen

Hi, thanks for your answers,

My experiment of trying to build the images for v12 didn't work, because I was not able to build the images my machine. But I saw that PR https://github.com/ollama/ollama/pull/5049 was accepted, so, I'll try again with the newer images. However, the fact that the command nvidia-smi executes properly on the jupyter container/pod and the same command do not exists on the ollama container seems odd. I'll post my full logs here to see if maybe there are something else I'm missing.

2024/08/19 18:16:06 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T18:16:06.416Z level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T18:16:06.416Z level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T18:16:06.416Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.6)"
time=2024-08-19T18:16:06.417Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2982119330/runners
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
time=2024-08-19T18:16:06.418Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2982119330/runners/cpu/ollama_llama_server
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2982119330/runners/cpu_avx/ollama_llama_server
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2982119330/runners/cpu_avx2/ollama_llama_server
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2982119330/runners/cuda_v11/ollama_llama_server
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2982119330/runners/rocm_v60102/ollama_llama_server
time=2024-08-19T18:16:11.441Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
time=2024-08-19T18:16:11.441Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-08-19T18:16:11.441Z level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-08-19T18:16:11.441Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T18:16:11.441Z level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-08-19T18:16:11.441Z level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so*
time=2024-08-19T18:16:11.441Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-08-19T18:16:11.442Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-08-19T18:16:11.442Z level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcudart.so*
time=2024-08-19T18:16:11.442Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2982119330/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-08-19T18:16:11.443Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/tmp/ollama2982119330/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-08-19T18:16:11.443Z level=DEBUG source=gpu.go:537 msg="Unable to load cudart" library=/tmp/ollama2982119330/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-08-19T18:16:11.443Z level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu"
time=2024-08-19T18:16:11.443Z level=INFO source=gpu.go:350 msg="no compatible GPUs were discovered"

Thanks,

Aug 19 '24 18:08 juancaoviedo

The logs inside the container just show that the container can't access the GPU. If you can use docker commands on the containers, what does the following show (substitute ollama for the appropriate name of your ollama container):

docker inspect ollama | jq '.[0].HostConfig.DeviceRequests'

On my ollama container, it shows the following:

[
  {
    "Driver": "nvidia",
    "Count": 1,
    "DeviceIDs": null,
    "Capabilities": [
      [
        "gpu"
      ]
    ],
    "Options": null
  }
]

How do you bring the container up? Is management done via ollama-helm? Are there logs from the management system that shows the steps it takes to bring up the ollama container?

Aug 19 '24 19:08 rick-github

Hi @rick-github,

Thanks for your answer. Yes, indeed, the pod/container was not able to access the GPUs at all. It was a missing configuration on the values.yaml file to the Helm chart. This post was helpful to find the issue: https://www.jimangel.io/posts/nvidia-rtx-gpu-kubernetes-setup/

So, the problem was not the version of the CUDA libraries, the problem was a variable on the Helm chart called runtimeClassName. This variable by default comes empty, and, in my case, I should replaced by nvidia.

So, as suggested by you and @dhiltgen, the problem was in the container runtime. For my case, I modified the helm chart in the following way:

# -- Specify runtime class
runtimeClassName: "nvidia"

And the final logs now look like this:

2024/08/19 19:09:13 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T19:09:13.799Z level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T19:09:13.799Z level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T19:09:13.799Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.6)"
time=2024-08-19T19:09:13.801Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3671039109/runners
time=2024-08-19T19:09:13.801Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-08-19T19:09:13.801Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-08-19T19:09:13.801Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-08-19T19:09:13.801Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-08-19T19:09:13.802Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-08-19T19:09:13.802Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-08-19T19:09:13.802Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-08-19T19:09:13.802Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
time=2024-08-19T19:09:13.802Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3671039109/runners/cpu/ollama_llama_server
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3671039109/runners/cpu_avx/ollama_llama_server
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3671039109/runners/cpu_avx2/ollama_llama_server
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3671039109/runners/cuda_v11/ollama_llama_server
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3671039109/runners/rocm_v60102/ollama_llama_server
time=2024-08-19T19:09:18.832Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
time=2024-08-19T19:09:18.832Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-08-19T19:09:18.832Z level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-08-19T19:09:18.832Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T19:09:18.832Z level=DEBUG source=gpu.go:90 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-08-19T19:09:18.832Z level=DEBUG source=gpu.go:472 msg="Searching for GPU library" name=libcuda.so*
time=2024-08-19T19:09:18.832Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-08-19T19:09:18.833Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01]
CUDA driver version: 12.2
time=2024-08-19T19:09:18.878Z level=DEBUG source=gpu.go:123 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01

So, this solved my problem. Now the GPUs are accesible from the pod/container. Thanks for your help!

Aug 19 '24 19:08 juancaoviedo