postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Liveness probe failed: Get "http://10.0.1.136:8081/readyz": dial tcp 10.0.1.136:8081: connect: connection refused

Open raphael10-collab opened this issue 3 months ago • 2 comments

(base) raphy@raohy:~/.talos/crunchy$ nano pgo-pv-worker-1.yaml

# https://github.com/rancher/local-path-provisioner/blob/master/examples/pvc-with-local-volume/pvc.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pgo-pv-worker-1
  namespace: postgres-operator
  annotations:
    volumeType: local
spec:
  storageClassName: local-storage-pgo
  local:
    path: /var/local-path-provisioner
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k8s-eu-1-worker-1

(base) raphy@raohy:~/.talos/crunchy$ kubectl apply -f pgo-pv-worker-1.yaml
persistentvolume/pgo-pv-worker-1 created



(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl apply -k kustomize/install/namespace
namespace/postgres-operator unchanged
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl apply --server-side -k kustomize/install/default
customresourcedefinition.apiextensions.k8s.io/crunchybridgeclusters.postgres-operator.crunchydata.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/pgadmins.postgres-operator.crunchydata.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/pgupgrades.postgres-operator.crunchydata.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/postgresclusters.postgres-operator.crunchydata.com serverside-applied
serviceaccount/pgo serverside-applied
clusterrole.rbac.authorization.k8s.io/postgres-operator serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator serverside-applied
deployment.apps/pgo serverside-applied




(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl apply -k kustomize/postgres
postgrescluster.postgres-operator.crunchydata.com/hippo created
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ 
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl -n postgres-operator describe postgresclusters.postgres-operator.crunchydata.com hippo
Name:         hippo
Namespace:    postgres-operator
Labels:       <none>
Annotations:  postgres-operator.crunchydata.com/autoCreateUserSchema: true
API Version:  postgres-operator.crunchydata.com/v1beta1
Kind:         PostgresCluster
Metadata:
  Creation Timestamp:  2025-10-05T16:44:56Z
  Generation:          1
  Resource Version:    94367216
  UID:                 afdd69e9-14f5-4520-8660-17ebeb985a0f
Spec:
  Backups:
    Pgbackrest:
      Repos:
        Name:  repo1
        Volume:
          Volume Claim Spec:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:  1Gi
  Instances:
    Data Volume Claim Spec:
      Access Modes:
        ReadWriteOnce
      Resources:
        Requests:
          Storage:   1Gi
    Name:            instance1
    Replicas:        1
  Port:              5432
  Postgres Version:  17
  Users:
    Databases:
      zoo
    Name:  hippo
Events:    <none>





(base) raphy@raohy:~/.talos/crunchy$ git clone --depth 1 https://github.com/raphael10-collab/postgres-operator-examples.git
Cloning into 'postgres-operator-examples'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (166/166), done.
remote: Compressing objects: 100% (129/129), done.
remote: Total 166 (delta 34), reused 99 (delta 22), pack-reused 0 (from 0)
Receiving objects: 100% (166/166), 191.61 KiB | 2.62 MiB/s, done.
Resolving deltas: 100% (34/34), done.

But the pgo pod goes, after few seconds, in CrashLoopBackOff State :

NAME                       READY   STATUS             RESTARTS     AGE
pod/pgo-86d5685c85-kjspd   0/1     CrashLoopBackOff   5 (9s ago)   6m13s

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pgo   0/1     1            0           6m13s

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/pgo-86d5685c85   1         1         0       6m13s
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl -n postgres-operator logs pgo-86d5685c85-kjspd
time="2025-10-05T16:49:33Z" level=debug msg="debug flag set to true" version=5.8.3-0
time="2025-10-05T16:49:33Z" level=info msg="feature gates" PGO_FEATURE_GATES= enabled="AutoCreateUserSchema=true,InstanceSidecars=true,PGUpgradeCPUConcurrency=true" version=5.8.3-0
time="2025-10-05T16:50:31Z" level=info msg="received signal from OS" signal=terminated version=5.8.3-0
panic: Get "https://10.96.0.1:443/api?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout
	Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout

goroutine 1 [running]:
main.assertNoError(...)
	github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:46
main.main()
	github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:206 +0x1325
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ 
(base) raphy@raohy:~/.talos/crunchy/postgres-operator-examples$ kubectl -n postgres-operator describe pod pgo-86d5685c85-kjspd
Name:             pgo-86d5685c85-kjspd
Namespace:        postgres-operator
Priority:         0
Service Account:  pgo
Node:             k8s-eu-1-worker-2/10.0.0.5
Start Time:       Sun, 05 Oct 2025 18:44:29 +0200
Labels:           app.kubernetes.io/name=pgo
                  app.kubernetes.io/version=5.8.3
                  pod-template-hash=86d5685c85
                  postgres-operator.crunchydata.com/control-plane=postgres-operator
Annotations:      <none>
Status:           Running
IP:               10.0.1.136
IPs:
  IP:           10.0.1.136
Controlled By:  ReplicaSet/pgo-86d5685c85
Containers:
  operator:
    Container ID:    containerd://e656b2eb359be35f303a21f996c339581d0e7256476ad127a8119d1dca7583f9
    Image:           registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.3-0
    Image ID:        registry.developers.crunchydata.com/crunchydata/postgres-operator@sha256:ab3d639e8ddacf2def17c15e6ae50ff7d0f3f28fac0783828a65675d109e1b0a
    Port:            8443/TCP
    Host Port:       0/TCP
    SeccompProfile:  RuntimeDefault
    State:           Waiting
      Reason:        CrashLoopBackOff
    Last State:      Terminated
      Reason:        Error
      Exit Code:     2
      Started:       Sun, 05 Oct 2025 18:52:02 +0200
      Finished:      Sun, 05 Oct 2025 18:53:02 +0200
    Ready:           False
    Restart Count:   6
    Liveness:        http-get http://:8081/readyz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:       http-get http://:8081/healthz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      PGO_FEATURE_GATES:                  
      PGO_INSTALLER:                      kustomize
      PGO_INSTALLER_ORIGIN:               examples-repo
      PGO_CONTROLLER_LEASE_NAME:          cpk-leader-election-lease
      PGO_NAMESPACE:                      postgres-operator (v1:metadata.namespace)
      CRUNCHY_DEBUG:                      true
      RELATED_IMAGE_POSTGRES_16:          registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi9-16.10-2534
      RELATED_IMAGE_POSTGRES_16_GIS_3.3:  registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi9-16.10-3.3-2534
      RELATED_IMAGE_POSTGRES_16_GIS_3.4:  registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi9-16.10-3.4-2534
      RELATED_IMAGE_POSTGRES_17:          registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi9-17.6-2534
      RELATED_IMAGE_POSTGRES_17_GIS_3.4:  registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi9-17.6-3.4-2534
      RELATED_IMAGE_POSTGRES_17_GIS_3.5:  registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi9-17.6-3.5-2534
      RELATED_IMAGE_PGBACKREST:           registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi9-2.56.0-2534
      RELATED_IMAGE_PGBOUNCER:            registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi9-1.24-2534
      RELATED_IMAGE_PGEXPORTER:           registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi9-0.17.1-2534
      RELATED_IMAGE_PGUPGRADE:            registry.developers.crunchydata.com/crunchydata/crunchy-upgrade:ubi9-17.6-2534
      RELATED_IMAGE_STANDALONE_PGADMIN:   registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi9-9.2-2534
      RELATED_IMAGE_COLLECTOR:            registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.3-0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-767cz (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  kube-api-access-767cz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                     From               Message
  ----     ------       ----                    ----               -------
  Normal   Scheduled    8m50s                   default-scheduler  Successfully assigned postgres-operator/pgo-86d5685c85-kjspd to k8s-eu-1-worker-2
  Warning  FailedMount  8m49s                   kubelet            MountVolume.SetUp failed for volume "kube-api-access-767cz" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Killing      7m48s                   kubelet            Container operator failed liveness probe, will be restarted
  Normal   Pulled       7m47s (x2 over 8m48s)   kubelet            Container image "registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.3-0" already present on machine
  Normal   Created      7m47s (x2 over 8m48s)   kubelet            Created container: operator
  Normal   Started      7m47s (x2 over 8m47s)   kubelet            Started container operator
  Warning  Unhealthy    7m8s (x5 over 8m28s)    kubelet            Liveness probe failed: Get "http://10.0.1.136:8081/readyz": dial tcp 10.0.1.136:8081: connect: connection refused
  Warning  Unhealthy    3m48s (x34 over 8m38s)  kubelet            Readiness probe failed: Get "http://10.0.1.136:8081/healthz": dial tcp 10.0.1.136:8081: connect: connection refused

What am I doing wrong? How to make it work?

raphael10-collab avatar Oct 05 '25 17:10 raphael10-collab

Hi @raphael10-collab! Sorry to hear you're having trouble.

Based on the error you're seeing, it looks like the operator is unable to talk to the Kubernetes API server. This indicates either a network issue in your Kubernetes cluster, or something like a Network Policy blocking network traffic, rather than being an issue with CPK itself.

andrewlecuyer avatar Oct 06 '25 15:10 andrewlecuyer

I got this error running Amazon Linux 2023 nodes. I had to revert to Amazon Linux 2 AMI nodes to get the PostgresClusters running correctly. @raphael10-collab are you on EKS?

The fix for running on Amazon Linux 2023 nodes appears to be to move to the UBI 9-based container images

paddatrapper avatar Oct 08 '25 17:10 paddatrapper