Run more than 1 replica of CSI deployment
Describe the solution you'd like We would like to deploy multiple replicas of the CSI deployment. Currently we experience a condition during rolling restarts of our Kubernetes clusters where nodes are restarted while the CSI pod is being recreated, which causes the CSI daemonset pod on those nodes to fail.
Describe alternatives you've considered We are planning to move the CSI deployment to the control plane nodes to reduce the number of times it gets evicted & recreated during cluster maintenance, but our preference would be to run this deployment in a HA configuration by scaling out to multiple replicas.
Hi @fhke,
We would like to better understand how the Trident daemsonset Pods are failing when the Trident controller Pod is not running in the Kubernetes cluster. When the Trident daemonset Pod has initialized it will attempt to register with the Trident controller Pod. If the Trident daemonset isn't able to register with the Trident controller it will begin to retry indefinitely using a retry backoff mechanism that starts at every 10 seconds and can increase to a maximum of every 120 seconds.
If you are experiencing and issue where the Trident daemonset remains in a failed state we would like to understand why. If possible please open a NetApp support case to help expedite our ability to root cause your issue.
I think @fhke means the trident-csi deployment, not the daemonset. For example, on one of my clusters:
$ oc get deploy -n kube-trident-operator
NAME READY UP-TO-DATE AVAILABLE AGE
trident-csi 1/1 1 1 96d
trident-operator 1/1 1 1 96d
$ oc get daemonset -n kube-trident-operator
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
trident-csi 9 9 9 9 9 kubernetes.io/arch=amd64,kubernetes.io/os=linux 96d
I am also interested in running multiple replicas of the trident-csi deployment, to provide better availability.