Christopher Desiniotis
Christopher Desiniotis
Hi everyone, This issue is not a bug with the nvidia driver-container. The driver-container requires that the kernel-headers for the running kernel are present and can be accessed by the...
We do not support Rocky Linux. Please refer to our platform support page for all the operating systems we currently support: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/platform-support.html#linux-distributions
Could you provide logs of the dcgm pod? Also, can you try deploying the operator again with dcgm disabled `--set dcgm.enabled=false`
The dcgm pod `nvidia-dcgm-kfqch` starts it, and the dcgm-exporter pod gets gpu metrics from it and exports them to prometheus
Hi @kralicky -- this is the expected behavior. You need a license per VM. It looks like you have 4 VMs, and so 4 licenses should be leased.
Hi @yug0slav Can you try naming your repo configuration files `CentOS-Vault.repo` and `cuda.repo` and retry this? Naming them as such will replace the existing repo configuration files in the driver...
Thanks for more details. If I am understanding you correctly, there are two issues. Correct me if I am wrong. 1. On CentOS 7, you have to name your repo...
@chrisholzheimer > apt seem to only raise a warning, but it seems nvidia-driver-daemonset pod still need packages from official repos to work: Yes, these packages are required for the driver...
> but it does not work as expected: default sources.list stays in use as well. Can you provide driver logs for this case? It is expected behavior for the default...
Can you confirm that `/etc/apt/sources.list.d/` gets created inside the driver container and your custom repo file can be found in that directory? Edit: Can you also confirm that your repo...