trident icon indicating copy to clipboard operation
trident copied to clipboard

Run more than 1 replica of CSI deployment

Open fhke opened this issue 3 years ago • 6 comments

Describe the solution you'd like We would like to deploy multiple replicas of the CSI deployment. Currently we experience a condition during rolling restarts of our Kubernetes clusters where nodes are restarted while the CSI pod is being recreated, which causes the CSI daemonset pod on those nodes to fail.

Describe alternatives you've considered We are planning to move the CSI deployment to the control plane nodes to reduce the number of times it gets evicted & recreated during cluster maintenance, but our preference would be to run this deployment in a HA configuration by scaling out to multiple replicas.

fhke avatar May 31 '22 11:05 fhke

Hi @fhke,

We would like to better understand how the Trident daemsonset Pods are failing when the Trident controller Pod is not running in the Kubernetes cluster. When the Trident daemonset Pod has initialized it will attempt to register with the Trident controller Pod. If the Trident daemonset isn't able to register with the Trident controller it will begin to retry indefinitely using a retry backoff mechanism that starts at every 10 seconds and can increase to a maximum of every 120 seconds.

If you are experiencing and issue where the Trident daemonset remains in a failed state we would like to understand why. If possible please open a NetApp support case to help expedite our ability to root cause your issue.

gnarl avatar Jun 09 '22 19:06 gnarl

I think @fhke means the trident-csi deployment, not the daemonset. For example, on one of my clusters:

$ oc get deploy -n kube-trident-operator
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
trident-csi        1/1     1            1           96d
trident-operator   1/1     1            1           96d

$ oc get daemonset -n kube-trident-operator
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                     AGE
trident-csi   9         9         9       9            9           kubernetes.io/arch=amd64,kubernetes.io/os=linux   96d

I am also interested in running multiple replicas of the trident-csi deployment, to provide better availability.

djjudas21 avatar Jul 13 '22 14:07 djjudas21