gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Operator should create one daemonset for each linux distribution found in node list

Open MartinForReal opened this issue 5 years ago • 0 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Quick Debug Checklist

  • [ * ] Are you running on an Ubuntu 18.04 node? Ubuntu 20.04.1 LTS Centos 7.8
  • [ * ] Are you running Kubernetes v1.13+?
  • [ * ] Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
  • [ * ] Do you have i2c_core and ipmi_msghandler loaded on the nodes?
  • [ *] Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

1. Issue or feature description

I have several centos node and ubuntu node in the same cluster. I want to use gpu-operator to install nvidia-driver. But the operator will try to install centos driver on ubuntu node. because the first node which contains nvidia.com/gpu-present=true is based on centos. operator should create daemonset for different distributions.

2. Steps to reproduce the issue

two node cluster in which one is ubuntu node the other is centos node installs gpu operator.

3. Information to attach (optional if deemed irrelevant)

  • [ ] kubernetes pods status: kubectl get pods --all-namespaces

  • [ ] kubernetes daemonset status: kubectl get ds --all-namespaces

  • [ ] If a pod/ds is in an error state or pending state kubectl describe pod -n NAMESPACE POD_NAME

  • [ ] If a pod/ds is in an error state or pending state kubectl logs -n NAMESPACE POD_NAME

  • [ ] Output of running a container on the GPU machine: docker run -it alpine echo foo

  • [ ] Docker configuration file: cat /etc/docker/daemon.json

  • [ ] Docker runtime configuration: docker info | grep runtime

  • [ ] NVIDIA shared directory: ls -la /run/nvidia

  • [ ] NVIDIA packages directory: ls -la /usr/local/nvidia/toolkit

  • [ ] NVIDIA driver directory: ls -la /run/nvidia/driver

  • [ ] kubelet logs journalctl -u kubelet > kubelet.logs

https://github.com/NVIDIA/gpu-operator/blob/a8be154494b46b63ac6d020ef57096504046f225/pkg/controller/clusterpolicy/object_controls.go#L156-L178

MartinForReal avatar Nov 03 '20 10:11 MartinForReal