postgres-operator Retry unsuccessful failover on unschedulable nodes

Retry unsuccessful failover on unschedulable nodes

Open simonklb opened this issue 1 year ago • 0 comments

Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes
Are you running Postgres Operator in production? yes
Type of issue? Feature request

When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.

I believe the relevant part is here: https://github.com/zalando/postgres-operator/blob/2e398120d2d0b3bb2b8bb239c6d49011ebe37e88/pkg/controller/node.go#L68-L72

Would you be open to change this behavior? Is the harm in letting the failover retry if it the node is still not ready?

Sep 04 '24 11:09 simonklb

postgres-operator postgres-operator copied to clipboard

Retry unsuccessful failover on unschedulable nodes

postgres-operator
postgres-operator copied to clipboard