gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

no runtime for "nvidia" is configured

Open yanis-incepto opened this issue 1 year ago • 8 comments

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 20.04
  • Kernel Version: Kubernetes 1.24.14
  • Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): Kops 1.24.1
  • GPU Operator Version: 23.9.2

2. Issue or feature description

kubectl describe pod nvidia-device-plugin-daemonset-w72xb -n gpu-operator
.... 
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               2m11s                 default-scheduler  Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-w72xb to i-071a4e5a302e4025b
  Warning  FailedCreatePodSandBox  12s (x10 over 2m11s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured

It looks like the runtime isn't present as it's not found but it exists.

kubectl get runtimeclasses.node.k8s.io                                               
NAME     HANDLER   AGE
nvidia   nvidia    7d1h 
kubectl describe runtimeclasses.node.k8s.io nvidia                                   
Name:         nvidia
Namespace:    
Labels:       app.kubernetes.io/component=gpu-operator
Annotations:  <none>
API Version:  node.k8s.io/v1
Handler:      nvidia
Kind:         RuntimeClass
Metadata:
  Creation Timestamp:  2024-05-27T08:53:18Z
  Owner References:
    API Version:           nvidia.com/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  ClusterPolicy
    Name:                  cluster-policy
    UID:                   2c237c3d-07eb-4856-8316-046489793e3d
  Resource Version:        265073642
  UID:                     26fd5054-7344-4e6d-9029-a610ae0df560
Events:                    <none>

3. Steps to reproduce the issue

I installed the chart with helmfile

4. Information to attach (optional if deemed irrelevant)

kubernetes pods status: kubectl get pods -n OPERATOR_NAMESPACE

 kubectl get pods -n gpu-operator                                                  
NAME                                                         READY   STATUS     RESTARTS   AGE
gpu-feature-discovery-spbbk                                  0/1     Init:0/1   0          41s
gpu-operator-d97f85598-j7qt4                                 1/1     Running    0          7d1h
gpu-operator-node-feature-discovery-gc-84c477b7-67tk8        1/1     Running    0          6d20h
gpu-operator-node-feature-discovery-master-cb8bb7d48-x4hqj   1/1     Running    0          6d20h
gpu-operator-node-feature-discovery-worker-jfdsh             1/1     Running    0          85s
nvidia-container-toolkit-daemonset-vb6qn                     0/1     Init:0/1   0          41s
nvidia-dcgm-exporter-9xmbm                                   0/1     Init:0/1   0          41s
nvidia-device-plugin-daemonset-w72xb                         0/1     Init:0/1   0          41s
nvidia-driver-daemonset-v4n96                                0/1     Running    0          73s
nvidia-operator-validator-vbq6v                              0/1     Init:0/4   0          41s

kubernetes daemonset status: kubectl get ds -n OPERATOR_NAMESPACE

 kubectl get ds -n gpu-operator                                                       
NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                          AGE
gpu-feature-discovery                        1         1         0       1            0           nvidia.com/gpu.deploy.gpu-feature-discovery=true                       7d
gpu-operator-node-feature-discovery-worker   1         1         1       1            1           instance-type=gpu                                                      6d20h
nvidia-container-toolkit-daemonset           1         1         0       1            0           nvidia.com/gpu.deploy.container-toolkit=true                           7d
nvidia-dcgm-exporter                         1         1         0       1            0           nvidia.com/gpu.deploy.dcgm-exporter=true                               7d
nvidia-device-plugin-daemonset               1         1         0       1            0           nvidia.com/gpu.deploy.device-plugin=true                               7d
nvidia-device-plugin-mps-control-daemon      0         0         0       0            0           nvidia.com/gpu.deploy.device-plugin=true,nvidia.com/mps.capable=true   7d
nvidia-driver-daemonset                      1         1         0       1            0           nvidia.com/gpu.deploy.driver=true                                      7d
nvidia-mig-manager                           0         0         0       0            0           nvidia.com/gpu.deploy.mig-manager=true                                 7d
nvidia-operator-validator                    1         1         0       1            0           nvidia.com/gpu.deploy.operator-validator=true                          7d

If a pod/ds is in an error state or pending state kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME

kubectl describe pod nvidia-device-plugin-daemonset-w72xb -n gpu-operator
.... 
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               2m11s                 default-scheduler  Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-w72xb to i-071a4e5a302e4025b
  Warning  FailedCreatePodSandBox  12s (x10 over 2m11s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured

Output from running nvidia-smi from the driver container: kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi

kubectl exec nvidia-driver-daemonset-v4n96 -n gpu-operator -c nvidia-driver-ctr -- nvidia-smi
Mon Jun  3 10:01:38 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

yanis-incepto avatar Jun 03 '24 10:06 yanis-incepto

@yanis-incepto the nvidia-container-toolkit-daemonset-vb6qn is stuck in the init state and has not yet configured the nvidia runtime in containerd. Could you provide the logs for the containers in this daemonset?

elezar avatar Jun 03 '24 10:06 elezar

nvidia-container-toolkit is finally running after some timebut still the error with the others (and it never gets away, i tried letting everyuthing during a few hours) :

kubectl get pods -n gpu-operator                                                     
NAME                                                         READY   STATUS     RESTARTS   AGE
gpu-feature-discovery-t4bv8                                  0/1     Init:0/1   0          10m
gpu-operator-d97f85598-j7qt4                                 1/1     Running    0          7d1h
gpu-operator-node-feature-discovery-gc-84c477b7-67tk8        1/1     Running    0          6d21h
gpu-operator-node-feature-discovery-master-cb8bb7d48-x4hqj   1/1     Running    0          6d21h
gpu-operator-node-feature-discovery-worker-fcwh7             1/1     Running    0          10m
nvidia-container-toolkit-daemonset-gn495                     1/1     Running    0          10m
nvidia-dcgm-exporter-wnhss                                   0/1     Init:0/1   0          10m
nvidia-device-plugin-daemonset-dwwqr                         0/1     Init:0/1   0          10m
nvidia-driver-daemonset-p47wp                                1/1     Running    0          10m
nvidia-operator-validator-zk4mv                              0/1     Init:0/4   0          10m

For his logs : it looks likle he's waiting for a signal :

kubectl logs -n gpu-operator nvidia-container-toolkit-daemonset-gn495                
Defaulted container "nvidia-container-toolkit-ctr" out of: nvidia-container-toolkit-ctr, driver-validation (init)
time="2024-06-03T10:46:29Z" level=info msg="Parsing arguments"
time="2024-06-03T10:46:29Z" level=info msg="Starting nvidia-toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Verifying Flags"
time="2024-06-03T10:46:29Z" level=info msg=Initializing
time="2024-06-03T10:46:29Z" level=info msg="Installing toolkit"
time="2024-06-03T10:46:29Z" level=info msg="disabling device node creation since --cdi-enabled=false"
time="2024-06-03T10:46:29Z" level=info msg="Installing NVIDIA container toolkit to '/usr/local/nvidia/toolkit'"
time="2024-06-03T10:46:29Z" level=info msg="Removing existing NVIDIA container toolkit installation"
time="2024-06-03T10:46:29Z" level=info msg="Creating directory '/usr/local/nvidia/toolkit'"
time="2024-06-03T10:46:29Z" level=info msg="Creating directory '/usr/local/nvidia/toolkit/.config/nvidia-container-runtime'"
time="2024-06-03T10:46:29Z" level=info msg="Installing NVIDIA container library to '/usr/local/nvidia/toolkit'"
time="2024-06-03T10:46:29Z" level=info msg="Finding library libnvidia-container.so.1 (root=)"
time="2024-06-03T10:46:29Z" level=info msg="Checking library candidate '/usr/lib64/libnvidia-container.so.1'"
time="2024-06-03T10:46:29Z" level=info msg="Skipping library candidate '/usr/lib64/libnvidia-container.so.1': error resolving link '/usr/lib64/libnvidia-container.so.1': lstat /usr/lib64/libnvidia-container.so.1: no such file or directory"
time="2024-06-03T10:46:29Z" level=info msg="Checking library candidate '/usr/lib/x86_64-linux-gnu/libnvidia-container.so.1'"
time="2024-06-03T10:46:29Z" level=info msg="Resolved link: '/usr/lib/x86_64-linux-gnu/libnvidia-container.so.1' => '/usr/lib/x86_64-linux-gnu/libnvidia-container.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/lib/x86_64-linux-gnu/libnvidia-container.so.1.15.0' to '/usr/local/nvidia/toolkit/libnvidia-container.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/lib/x86_64-linux-gnu/libnvidia-container.so.1.15.0' to '/usr/local/nvidia/toolkit/libnvidia-container.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Creating symlink '/usr/local/nvidia/toolkit/libnvidia-container.so.1' -> 'libnvidia-container.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Finding library libnvidia-container-go.so.1 (root=)"
time="2024-06-03T10:46:29Z" level=info msg="Checking library candidate '/usr/lib64/libnvidia-container-go.so.1'"
time="2024-06-03T10:46:29Z" level=info msg="Skipping library candidate '/usr/lib64/libnvidia-container-go.so.1': error resolving link '/usr/lib64/libnvidia-container-go.so.1': lstat /usr/lib64/libnvidia-container-go.so.1: no such file or directory"
time="2024-06-03T10:46:29Z" level=info msg="Checking library candidate '/usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1'"
time="2024-06-03T10:46:29Z" level=info msg="Resolved link: '/usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1' => '/usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1.15.0' to '/usr/local/nvidia/toolkit/libnvidia-container-go.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1.15.0' to '/usr/local/nvidia/toolkit/libnvidia-container-go.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Creating symlink '/usr/local/nvidia/toolkit/libnvidia-container-go.so.1' -> 'libnvidia-container-go.so.1.15.0'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-container-runtime' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-container-runtime' to '/usr/local/nvidia/toolkit/nvidia-container-runtime.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-container-runtime.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-container-runtime'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-container-runtime.cdi' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-container-runtime.cdi' to '/usr/local/nvidia/toolkit/nvidia-container-runtime.cdi.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-container-runtime.cdi.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-container-runtime.cdi'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-container-runtime.legacy' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-container-runtime.legacy' to '/usr/local/nvidia/toolkit/nvidia-container-runtime.legacy.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-container-runtime.legacy.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-container-runtime.legacy'"
time="2024-06-03T10:46:29Z" level=info msg="Installing NVIDIA container CLI from '/usr/bin/nvidia-container-cli'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-container-cli' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-container-cli' to '/usr/local/nvidia/toolkit/nvidia-container-cli.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-container-cli.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-container-cli'"
time="2024-06-03T10:46:29Z" level=info msg="Installing NVIDIA container runtime hook from '/usr/bin/nvidia-container-runtime-hook'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-container-runtime-hook' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-container-runtime-hook' to '/usr/local/nvidia/toolkit/nvidia-container-runtime-hook.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-container-runtime-hook.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-container-runtime-hook'"
time="2024-06-03T10:46:29Z" level=info msg="Creating symlink '/usr/local/nvidia/toolkit/nvidia-container-toolkit' -> 'nvidia-container-runtime-hook'"
time="2024-06-03T10:46:29Z" level=info msg="Installing executable '/usr/bin/nvidia-ctk' to /usr/local/nvidia/toolkit"
time="2024-06-03T10:46:29Z" level=info msg="Installing '/usr/bin/nvidia-ctk' to '/usr/local/nvidia/toolkit/nvidia-ctk.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed '/usr/local/nvidia/toolkit/nvidia-ctk.real'"
time="2024-06-03T10:46:29Z" level=info msg="Installed wrapper '/usr/local/nvidia/toolkit/nvidia-ctk'"
time="2024-06-03T10:46:29Z" level=info msg="Installing NVIDIA container toolkit config '/usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml'"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-runtime.modes.cdi.annotation-prefixes"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-runtime.runtimes"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-cli.debug"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-runtime.debug"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-runtime.log-level"
time="2024-06-03T10:46:29Z" level=info msg="Skipping unset option: nvidia-container-runtime.mode"
Using config:
accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
  environment = []
  ldconfig = "@/run/nvidia/driver/sbin/ldconfig.real"
  load-kmods = true
  path = "/usr/local/nvidia/toolkit/nvidia-container-cli"
  root = "/run/nvidia/driver"

[nvidia-container-runtime]
  log-level = "info"
  mode = "auto"
  runtimes = ["docker-runc", "runc", "crun"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["cdi.k8s.io/"]
      default-kind = "management.nvidia.com/gpu"
      spec-dirs = ["/etc/cdi", "/var/run/cdi"]

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/usr/local/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/usr/local/nvidia/toolkit/nvidia-ctk"
time="2024-06-03T10:46:29Z" level=info msg="Setting up runtime"
time="2024-06-03T10:46:29Z" level=info msg="Parsing arguments: [/usr/local/nvidia/toolkit]"
time="2024-06-03T10:46:29Z" level=info msg="Successfully parsed arguments"
time="2024-06-03T10:46:29Z" level=info msg="Starting 'setup' for containerd"
time="2024-06-03T10:46:29Z" level=info msg="Config file does not exist; using empty config"
time="2024-06-03T10:46:29Z" level=info msg="Flushing config to /runtime/config-dir/config.toml"
time="2024-06-03T10:46:29Z" level=info msg="Sending SIGHUP signal to containerd"
time="2024-06-03T10:46:29Z" level=info msg="Successfully signaled containerd"
time="2024-06-03T10:46:29Z" level=info msg="Completed 'setup' for containerd"
time="2024-06-03T10:46:29Z" level=info msg="Waiting for signal"

yanis-incepto avatar Jun 03 '24 10:06 yanis-incepto

Please restart the nvidia-operator-validator-zk4mv pod to start with. If this proceeds, then restart the other pods too.

elezar avatar Jun 03 '24 11:06 elezar

just recreated the pod, still same issue

yanis-incepto avatar Jun 03 '24 11:06 yanis-incepto

is there a compatibility table for gpu-operator ? Maybe latest version is not compatible with kubernetes 1.24.14 ?

yanis-incepto avatar Jun 03 '24 11:06 yanis-incepto

I had this issue when my /etc/containerd/config.toml was incorrect (was missing runc from default). This is what it looks like now on each node:

version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".containerd]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

	[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = "/usr/bin/runc"

Li357 avatar Jun 08 '24 00:06 Li357

I had this issue when my /etc/containerd/config.toml was incorrect (was missing runc from default). This is what it looks like now on each node:

version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".containerd]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

	[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = "/usr/bin/runc"

Hello, Thanks for your help, but unfortunately, i just tried but it didn't work

yanis-incepto avatar Jun 10 '24 08:06 yanis-incepto

@yanis-incepto can you share the contents of your /etc/containerd/config.toml file?

cdesiniotis avatar Jul 11 '24 23:07 cdesiniotis

Hello I have the exact same issue. My cluster in implemented base on k0s. The version of kubernetes cluster 1.32.4 and nvidia-smi output is.

kubectl exec -it nvidia-driver-daemonset-68thz -- nvidia-smi
Thu Jul 10 07:54:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000001:00:00.0 Off |                    0 |
| N/A   31C    P0             42W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

behnm avatar Jul 10 '25 07:07 behnm

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Nov 04 '25 22:11 github-actions[bot]

This issue has been open for a long time without recent updates, and the context may now be outdated. More details were requested in https://github.com/NVIDIA/gpu-operator/issues/730#issuecomment-2224096611 but there has been no update since then. Hence, closing this issue.

@behnm I would suggest following the latest procedure for installing GPU Operator 25.10.0 with k0s: https://catalog.k0rdent.io/latest/apps/nvidia/#install. If you are still experiencing problems, please file a new issue.

cdesiniotis avatar Nov 15 '25 02:11 cdesiniotis