postgres-operator the Operator rolling update statefulsets unnecessary whien Kube API down.

the Operator rolling update statefulsets unnecessary whien Kube API down.

Open machine424 opened this issue 3 years ago • 0 comments

Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes
Are you running Postgres Operator in production? yes
Type of issue? Bug report

Hello,

The operator may rollout statefulsets unnecessary if the Kube API is down.

https://github.com/zalando/postgres-operator/blob/6d0117b662bc2fd7880352b58258792ff2941d0a/pkg/cluster/k8sres.go#L992-L1023

If the operator cannot fetch the secret because of etcdserver: request timed out" cluster-name=namespace/pg-cluster pkg=cluster, it will return an empty slice then the operator will construct the desired statefulset without the env variables and update the statefulset with:

level=warning msg="could not read Secret PodEnvironmentSecretName: etcdserver: request timed out" cluster-name=namespace/pg-cluster pkg=cluster
...
level=debug msg="mark rolling update annotation for pg-cluster: reason pod changes" cluster-name=namespace/pg-cluster pkg=cluster
level=info msg="statefulset namespace/pg-cluster is not in the desired state and needs to be updated" cluster-name=namespace/pg-cluster pkg=cluster
... SHOWING the diff ...
...
level=info msg="reason: new statefulset containers's postgres (index 0) environment does not match the current one" cluster-name=namespace/pg-cluster pkg=cluster
level=debug msg="updating statefulset" cluster-name=namespace/pg-cluster pkg=cluster
...
level=debug msg="performing rolling update" cluster-name=namespace/pg-cluster pkg=cluster

Of course, when the Kube API is up again, the operator will perform another rolling update to re-inject the secret's content.

Is this the acceptable behaviour ? not for me, I think the operator should do so only in case the secret apierrors.IsNotFound (was deleted because not needed anymore, and even in this case I'm not sure), in this case I think the operator should stop sync the clusters and wait until it has a clear vision of the real current state.

We use one node PG clusters with standby clusters as we don't want to activate failover for now. But I think even with multi-nodes PG clusters this will still perform 2 rolling updates for nothing.

Sep 02 '22 11:09 machine424

postgres-operator postgres-operator copied to clipboard

the Operator rolling update statefulsets unnecessary whien Kube API down.

postgres-operator
postgres-operator copied to clipboard