gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

User "system:serviceaccount:default:gpu-operator" cannot list resource "daemonsets" in API group "apps" at the cluster scope with Helm Template

Open Li357 opened this issue 1 year ago • 1 comments

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04
  • Kernel Version: 5.4.0-177-generic
  • Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): k8s
  • GPU Operator Version: 24.3.0

2. Issue or feature description

I'm using Kustomize's helmCharts to install gpu-operator (via helm template then apply) but for some reason getting

E0612 21:52:49.269498       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch *v1.DaemonSet: failed to list *v1.DaemonSet: daemonsets.apps is forbidden: User "system:serviceaccount:default:gpu-operator" cannot list resource "daemonsets" in API group "apps" at the cluster scope

in my gpu-operator pod. Also, the gpu-operator pod is not in the namespace specified by Helm, but for some reason put in the default namespace. This doesn't make sense as the cluster role is correctly configured here? I've verified this makes cluster role is correct in helm template.

3. Steps to reproduce the issue

helm template --generate-name --namespace test --create-namespace nvidia/gpu-operator | kubectl apply -f -
kubectl get pods # for some reason gpu-operator is now in the default namespace instead of the provided namespace?
kubectl logs <gpu-operator-podname>

I get that I should be installing through helm install, but I want to use this with Kustomize, which internally uses helm template to inflate charts. Why is the install behavior different than the template behavior (I didn't see any hooks?)

Li357 avatar Jun 12 '24 22:06 Li357

@Li357 any update on this ? I have the same problem

using

...

helmCharts:
  - ...

namespace: gpu-operator

fix the problem for me, but this could cause others problems and would like to be working with the -n flag in the helm template

Bfault avatar Jul 17 '24 18:07 Bfault

I have the same issue. I am using Kustomize to generate the configuration. The role and role binding both appear to be correct

mike-ensor avatar Nov 27 '24 03:11 mike-ensor

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Nov 04 '25 22:11 github-actions[bot]

This issue has been open for over 90 days without recent updates, and the context may now be outdated.

This has been fixed in the newer versions of gpu-operator. Given that gpu-operator 24.3.0 is EOL now, I would encourage you to try latest version and see if you still see this issue.

If this issue is still relevant with the latest version of the NVIDIA GPU Operator, please feel free to reopen it or open a new one with updated details.

rahulait avatar Nov 14 '25 19:11 rahulait

This looks like a valid bug and needs to be fixed. Reopening. There is no mentioning of namespace here and when one does template and apply, it puts gpu-operator deployment in default namespace. This needs to be fixed.

rahulait avatar Nov 14 '25 19:11 rahulait