postgres-operator pod labels error: still failing after 200 retries

Hi Team,

While trying to create the postgres-cluster cluster takes ~5 mins to bind to the PVC due to which the postgres-cluster fails with below error,

time="2020-07-14T12:32:14Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"postgres-test\", Name:\"postgres-operator-cluster\", UID:\"cee078a3-0bea-4175-a6c8-7530590058ac\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"157085264\", FieldPath:\"\"}): type: 'Warning' reason: 'Sync' could not sync cluster: could not sync statefulsets: cluster is not ready: pod labels error: still failing after 200 retries"

Could you please suggest to increase the no of retries or the wait time till the PVC is bound and not face error.

Thanks

Jul 15 '20 06:07 neelasha-09

5 minutes sounds too long. Which provisioner are you using? However, timeouts can be configured. Try to increase these values if needed.

Jul 21 '20 16:07 FxKu

Hi

The error says fails after 200 reties. The resource_check_interval is 3s which means 3s*200 reties = 600 secs = 10mins But the resource_check_timeout by default is 10 mins

The question is if PVC takes ~5 mins to bound so why the error ? Please let me know if i missed anything :)

Thanks in advance.

Aug 24 '20 12:08 neelasha-09

I know this is an ancient issue, but I ran into a similar issue today and thought I would provide some context in case others run into the same situation. This was provisioning a ZPO managed Postgresql database on OpenShift CRC. I have provisioned many of them successfully so this was seemed more like a bizarre one-off incident, but I gathered some logs and events in case it helps anyone here at some point.

Operator logs are not saying much:

time="2024-05-22T09:35:19Z" level=info msg="ADD event has been queued" cluster-name=appns/app-db pkg=controller worker=1
time="2024-05-22T09:35:19Z" level=info msg="creating a new Postgres cluster" cluster-name=appns/app-db pkg=controller worker=1
time="2024-05-22T09:35:19Z" level=info msg="master service \"appns/app-db\" has been successfully created" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="replica service \"appns/app-db-repl\" has been successfully created" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="team API is disabled" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="team API is disabled" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="users have been initialized" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="syncing secrets" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="created new secret appns/standby.app-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: 7a125a12-afca-4322-99da-6d648a0b204e" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="created new secret appns/appuser.app-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: d8c479c3-5ee6-48c8-843a-25a2591918df" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="created new secret appns/postgres.app-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: ea890b6d-28b9-40b4-be7d-d5ed9cd2b19c" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="secrets have been successfully created" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="pod disruption budget \"appns/postgres-app-db-pdb\" has been successfully created" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=debug msg="created new statefulset \"appns/app-db\", uid: \"072aa2ee-a1c2-4f5d-8648-30bea93adcdf\"" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="statefulset \"appns/app-db\" has been successfully created" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:19Z" level=info msg="waiting for the cluster being ready" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:35:25Z" level=debug msg="Waiting for 2 pods to become ready" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:45:22Z" level=error msg="failed to create cluster: pod labels error: still failing after 200 retries" cluster-name=appns/app-db pkg=cluster worker=1
time="2024-05-22T09:45:22Z" level=error msg="could not create cluster: pod labels error: still failing after 200 retries" cluster-name=appns/app-db pkg=controller worker=1

Events are more interesting since they contain "FailedMount" and "NetworkNotReady" warnings, which I cannot explain why they would happen.

100m        Warning   FailedMount             pod/app-db-0                                       MountVolume.SetUp failed for volume "kube-api-access-z79rx" : [object "appns"/"kube-root-ca.crt" not registered, object "appns"/"openshift-service-ca.crt" not registered]
100m        Warning   FailedMount             pod/app-db-1                                       MountVolume.SetUp failed for volume "kube-api-access-vl99g" : [object "appns"/"kube-root-ca.crt" not registered, object "appns"/"openshift-service-ca.crt" not registered]
100m        Warning   NetworkNotReady         pod/app-db-1                                       network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
100m        Warning   NetworkNotReady         pod/app-db-0                                       network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
99m         Warning   FailedMount             pod/app-db-0                                       MountVolume.MountDevice failed for volume "pvc-e51cbbaf-6d9f-4baf-83b0-b7f46a53f1f1" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name kubevirt.io.hostpath-provisioner not found in the list of registered CSI drivers
99m         Warning   FailedMount             pod/app-db-1                                       MountVolume.MountDevice failed for volume "pvc-1be09b97-42e7-46ed-9151-7a5ca99fc5ac" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name kubevirt.io.hostpath-provisioner not found in the list of registered CSI drivers
99m         Normal    AddedInterface          pod/app-db-0                                       Add eth0 [10.217.0.65/23] from openshift-sdn
99m         Normal    Pulled                  pod/app-db-0                                       Container image "ghcr.io/zalando/spilo-15:3.0-p1" already present on machine
99m         Normal    Started                 pod/app-db-0                                       Started container postgres
99m         Normal    Created                 pod/app-db-0                                       Created container postgres
99m         Normal    Pulled                  pod/app-db-1                                       Container image "ghcr.io/zalando/spilo-15:3.0-p1" already present on machine
99m         Normal    AddedInterface          pod/app-db-1                                       Add eth0 [10.217.0.70/23] from openshift-sdn
99m         Normal    Started                 pod/app-db-1                                       Started container postgres
99m         Normal    Created                 pod/app-db-1                                       Created container postgres
53m         Normal    Delete                  postgresql/app-db                                  Started deletion of new cluster resources
53m         Normal    Killing                 pod/app-db-0                                       Stopping container postgres
50m         Normal    Killing                 pod/app-db-1                                       Stopping container postgres

I had created another Postgresql database successfully like a minute before the above problem happened, which makes the problem even more strange.

Operator logs of successful db creation:

time="2024-05-22T09:34:48Z" level=info msg="ADD event has been queued" cluster-name=appns/main-db pkg=controller worker=0
time="2024-05-22T09:34:48Z" level=info msg="creating a new Postgres cluster" cluster-name=appns/main-db pkg=controller worker=0
time="2024-05-22T09:34:49Z" level=info msg="master service \"appns/main-db\" has been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="replica service \"appns/main-db-repl\" has been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="team API is disabled" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="team API is disabled" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="users have been initialized" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="syncing secrets" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="created new secret appns/ardome.main-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: 8649dc23-8344-430c-b0b8-9ad3c1bdfbbb" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="created new secret appns/postgres.main-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: 8c598bb5-006d-48e6-95d0-2a67741bc4e3" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="created new secret appns/standby.main-db.credentials.postgresql.acid.zalan.do, namespace: appns, uid: 53e89029-b44d-48c0-97be-9a4e95ba44da" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="secrets have been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="pod disruption budget \"appns/postgres-main-db-pdb\" has been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=debug msg="created new statefulset \"appns/main-db\", uid: \"400bf089-6931-4411-ba16-84b300c15d9e\"" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="statefulset \"appns/main-db\" has been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:49Z" level=info msg="waiting for the cluster being ready" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:34:55Z" level=debug msg="Waiting for 2 pods to become ready" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:43Z" level=info msg="pods are ready" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:43Z" level=info msg="Create roles" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:43Z" level=debug msg="closing database connection" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:43Z" level=info msg="users have been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="creating database \"mdb\" owner \"mdbuser\"" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=debug msg="closing database connection" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="databases have been successfully created" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found pod disruption budget: \"appns/postgres-main-db-pdb\" (uid: \"6b353b91-26b1-4bda-b096-6fe07596e875\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found statefulset: \"appns/main-db\" (uid: \"400bf089-6931-4411-ba16-84b300c15d9e\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found secret: \"appns/standby.main-db.credentials.postgresql.acid.zalan.do\" (uid: \"53e89029-b44d-48c0-97be-9a4e95ba44da\") namesapce: appns" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found secret: \"appns/ardome.main-db.credentials.postgresql.acid.zalan.do\" (uid: \"8649dc23-8344-430c-b0b8-9ad3c1bdfbbb\") namesapce: appns" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found secret: \"appns/postgres.main-db.credentials.postgresql.acid.zalan.do\" (uid: \"8c598bb5-006d-48e6-95d0-2a67741bc4e3\") namesapce: appns" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found master service: \"appns/main-db\" (uid: \"3419103e-b7ca-4359-8550-0872a7936623\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found replica service: \"appns/main-db-repl\" (uid: \"3d84525a-afa5-4e28-88da-a3b1e2a78877\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found pod: \"appns/main-db-0\" (uid: \"0edb066f-86ee-4cad-ae32-6d3d6b2b65c9\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found pod: \"appns/main-db-1\" (uid: \"7c3ec328-aea3-4d8c-bb40-a67bd3d0e181\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found PVC: \"appns/pgdata-main-db-0\" (uid: \"226a6f4e-8b1e-48c1-acd1-84065545f64d\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="found PVC: \"appns/pgdata-main-db-1\" (uid: \"afa44811-13b4-4a99-8161-7b7a85d19e87\")" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=debug msg="syncing connection pooler (master, replica) from (false, nil) to (false, nil)" cluster-name=appns/main-db pkg=cluster worker=0
time="2024-05-22T09:35:44Z" level=info msg="cluster has been created" cluster-name=appns/main-db pkg=controller worker=0

The problematic database is just a test/dev database so I ended up deleting it and recreating it, and that worked fine.

May 22 '24 10:05 bergner

@FxKu Hi Still facing the same issue in 2025. I got the same problem as bergner. And the reason is unknown. I am using Zalango postgres operator version 1.10.1. Is there any updates or fixes?

Jan 17 '25 11:01 Razher

postgres-operator postgres-operator copied to clipboard

pod labels error: still failing after 200 retries

postgres-operator
postgres-operator copied to clipboard