postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Retry unsuccessful failover on unschedulable nodes

Open simonklb opened this issue 1 year ago • 0 comments

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes
  • Are you running Postgres Operator in production? yes
  • Type of issue? Feature request

When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.

I believe the relevant part is here: https://github.com/zalando/postgres-operator/blob/2e398120d2d0b3bb2b8bb239c6d49011ebe37e88/pkg/controller/node.go#L68-L72

Would you be open to change this behavior? Is the harm in letting the failover retry if it the node is still not ready?

simonklb avatar Sep 04 '24 11:09 simonklb