The Trident operator fails to install via Helm on Rancher
Describe the bug
When installing the Trident operator from the Helm chart in a Kubernetes cluster managed by Rancher, the operator fails because it is unable to add the PSA label pod-security.kubernetes.io/enforce: privileged on its installation namespace. This is because Rancher has a special admission webhook in place for setting PSA labels, which must be granted to the ServiceAccount, on top of all the other RBAC rules it needs.
Environment
- Trident version: 23.04.0
- Trident installation flags used:
helm install trident netapp-trident/trident-operator --version 23.04.0 --create-namespace --namespace trident - Container runtime: Containerd v1.6.19-k3s1
- Kubernetes version: v1.25.9
- Kubernetes orchestrator: Rancher v2.7.5
- Kubernetes enabled feature gates: None.
- OS: Ubuntu 22.04.2 LTS
- NetApp backend types: n/a
- Other: n/a
To Reproduce
-
Have a Rancher managed RKE2 cluster (but I'm guessing it'll work with any Rancher managed cluster).
-
helm repo add netapp-trident https://netapp.github.io/trident-helm-chart -
helm install trident netapp-trident/trident-operator --version 23.04.0 --create-namespace --namespace trident -
Check the status of the installed CRDs, the
tridentTridentOrchestrator object and the pods deployed:$ kubectl get crd | grep trident tridentorchestrators.trident.netapp.io 2023-06-28T14:56:46Z $ kubectl -n trident get pods NAME READY STATUS RESTARTS AGE trident-operator-5789cf4777-nc4vn 1/1 Runnnig 0 7m32s $ kubectl -n trident get tridentorchestrators trident -o yaml […] status: message: 'Failed to install Trident; err: failed to patch Trident installation namespace trident; admission webhook "rancher.cattle.io.namespaces" denied the request: Unauthorized' namespace: trident status: Failed version: ""
Expected behavior
I expect it to deploy as it should and not crash. Here's an example of what it looks like when deploying successfully:
$ kubectl -n trident get pods
NAME READY STATUS RESTARTS AGE
trident-controller-6d7c9c5d8c-wg8zj 6/6 Running 0 4h28m
trident-node-linux-4tk6q 2/2 Running 0 4h28m
trident-node-linux-97rgx 2/2 Running 0 4h28m
trident-node-linux-9jfbh 2/2 Running 0 4h28m
trident-node-linux-btjx6 2/2 Running 0 4h28m
trident-node-linux-n5k75 2/2 Running 0 4h28m
trident-node-linux-vpcgd 2/2 Running 0 4h28m
trident-operator-5789cf4777-66mth 1/1 Running 0 4h29m
$ kubectl get crd | grep trident
tridentbackendconfigs.trident.netapp.io 2023-07-05T08:09:56Z
tridentbackends.trident.netapp.io 2023-07-05T08:09:55Z
tridentmirrorrelationships.trident.netapp.io 2023-07-05T08:10:00Z
tridentnodes.trident.netapp.io 2023-07-05T08:09:58Z
tridentorchestrators.trident.netapp.io 2023-06-28T14:56:46Z
tridentsnapshotinfos.trident.netapp.io 2023-07-05T08:09:56Z
tridentsnapshots.trident.netapp.io 2023-07-05T08:09:59Z
tridentstorageclasses.trident.netapp.io 2023-07-05T08:09:56Z
tridenttransactions.trident.netapp.io 2023-07-05T08:09:59Z
tridentversions.trident.netapp.io 2023-07-05T08:09:55Z
tridentvolumepublications.trident.netapp.io 2023-07-05T08:09:57Z
tridentvolumereferences.trident.netapp.io 2023-07-05T08:10:00Z
tridentvolumes.trident.netapp.io 2023-07-05T08:09:57Z
Additional context
This was already reported to Rancher's GitHub page as issue #41191. People (understandably) thought that this was a bug in Rancher, while it's more of a documentation issue on their part (in my opinion).
There's also some information available in the operator's pod logs. I don't have them easily available right now, but it basically amounts to the same message as the one displayed by the TridentOrchestrator object anyway; it fails to patch the trident namespace because the Rancher admission webhook rancher.cattle.io.namespaces denied the request (Unauthorized).
Work-around
Inspired by this comment from the issue reported to Rancher's GitHub page, applying the following manifest and then restarting the operator fixes the issue:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: trident-operator-psa
rules:
- apiGroups:
- management.cattle.io
resources:
- projects
verbs:
- updatepsa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: trident-operator-psa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: trident-operator-psa
subjects:
- kind: ServiceAccount
name: trident-operator
namespace: trident
We're running into the same issue after upgrading from Rancher 2.6.11 to 2.7.5. I can confirm that your workaround fixes the issue.
@lindhe: Thanks for bringing this up and creating the corresponding pull request. I can confirm as well, that this solves the issue in my cluster.
Does NetApp has a plan to merge this at some point in time? Applying these workarounds in automation is a bit cumbersome and unclean.
We're still seeing the same issue in Rancher 2.7.9 and Trident 23.10.0. Can we perhaps get an update from Netapp on this issue and the pending PR?
@nheinemans-asml Could you try with v24.10.0? It's apparently resolved there, but I have no idea which PR that was.
@lindhe I tested it with Rancher v2.9.2 and trident 24.10.0 is still an issue. After applying the workaround it suceeds:
kubectl describe torc trident
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Installing 16m trident-operator.netapp.io Installing Trident
Warning Failed 3m45s (x6 over 16m) trident-operator.netapp.io Failed to install Trident; err: failed to patch Trident installation namespace netapp-trident; admission webhook "rancher.cattle.io.namespaces" denied the request: Unauthorized
Normal Installed 27s trident-operator.netapp.io Trident installed
Hi @betweenclouds This should have been fixed in 24.10.0 as part of https://github.com/NetApp/trident/commit/5824103a201cb2f1be13f9435e554ad160c829b3
Can you try setting the forceInstallRancherClusterRoles: true in helm/trident-operator/values.yaml
@sjpeeris Thank you, with forceInstallRancherClusterRoles=true the installation is sucessful, but only if I create a namespace named trident. Is this a expected behavior?
works:
helm install netapp-trident netapp-trident/trident-operator --version 100.2410.0 --create-namespace --namespace trident --set tridentDebug=true --set forceInstallRancherClusterRoles=true
does not work:
helm install netapp-trident netapp-trident/trident-operator --version 100.2410.0 --create-namespace --namespace netapp-trident --set tridentDebug=true --set forceInstallRancherClusterRoles=true
edit:
Namespace is hard-coded here: https://github.com/NetApp/trident/blob/master/helm/trident-operator/templates/clusterrolebinding-rancher.yaml#L13
instead of a variable like here: https://github.com/NetApp/trident/blob/master/helm/trident-operator/templates/clusterrolebinding.yaml#L10
Hi @betweenclouds, you are correct. That namespace shouldn't be hard-coded. We will have this fixed in the next release. Thanks for pointing that out.