gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

Deploy nvidia-device-plugin-daemonset to only certain nodes

Open Hayes-buzzni opened this issue 2 years ago • 2 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Quick Debug Checklist

  • [x] Are you running on an Ubuntu 18.04 node?
  • [x] Are you running Kubernetes v1.13+?
  • [x] Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
  • [x] Do you have i2c_core and ipmi_msghandler loaded on the nodes?
  • [x] Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

1. Issue or feature description

I'm currently trying to deploy gpu-operator to only certain nodes (at least the nvidia-device-plugin) using helm. I've checked all the issue pages and documentation, but I haven't found a way to deploy to specific nodes. I've also tried changing the templete to specify affinity, but that didn't work well. Is there any way to deploy gpu-operator, or the nvidia-device-plugin that gpu-operator deploys, to specific nodes?

Hayes-buzzni avatar Jul 10 '23 11:07 Hayes-buzzni

I created a related PR to add node selector capabilities to common daemonset config via the ClusterPolicy: https://gitlab.com/nvidia/kubernetes/gpu-operator/-/merge_requests/976. I think this should address the need here.

rockholla avatar Dec 06 '23 15:12 rockholla

Hi, @rockholla Do you have any new progress? I've also encountered this problem. If you don't have spare time, I can continue with your subsequent work.

chaunceyjiang avatar Dec 18 '23 09:12 chaunceyjiang