gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

label to nodes are too much, especially label non-gpu nodes.

Open johnzheng1975 opened this issue 1 year ago • 0 comments

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04): aws linux
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): eks 1.29
  • GPU Operator Version: url: https://helm.ngc.nvidia.com/nvidia chart: gpu-operator version: v24.3.0

2. Issue or feature description

label to nodes are too much (we can say crazy ^_^ ), especially label non-gpu nodes.

image

I hope, at least for non-gpu node, pls do not add labels to it, or only add one to two labels.

3. Steps to reproduce the issue

Detailed steps to reproduce the issue. Install gpu-operator View the nodes labels;

4. Information to attach (optional if deemed irrelevant)

  • [ ] kubernetes pods status: kubectl get pods -n OPERATOR_NAMESPACE
  • [ ] kubernetes daemonset status: kubectl get ds -n OPERATOR_NAMESPACE
  • [ ] If a pod/ds is in an error state or pending state kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME
  • [ ] If a pod/ds is in an error state or pending state kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers
  • [ ] Output from running nvidia-smi from the driver container: kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi
  • [ ] containerd logs journalctl -u containerd > containerd.log

Collecting full debug bundle (optional):

curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh 
chmod +x must-gather.sh
./must-gather.sh

NOTE: please refer to the must-gather script for debug data collected.

This bundle can be submitted to us via email: [email protected]

johnzheng1975 avatar Jun 07 '24 02:06 johnzheng1975