charts icon indicating copy to clipboard operation
charts copied to clipboard

[bitnami/postgresql-ha] chart, second slave gets stuck waiting for primary node and CrashLoopBackOff

Open nleeuskadi opened this issue 1 year ago • 1 comments

Hi Bitnami team :)

I am facing issue while using the postgresl-ha chart. The pb is that one of the 2 slave node failed to start because it get stuck waiting for the primary node.

You could fin more details below.

Thank you in advance for your help :)

Cheers!

Name and Version

bitnami/postgresql-ha latest (but with 13.2.4 version, I also reproduce)

What architecture are you using?

Kubernetes Kind on Ubuntu virtual machines

What steps will reproduce the bug?

  1. In this environment Kind V0.10.0 Kubernetes 4 nodes cluster hosted on Ubuntu 22.04.2 LTS VMs

  2. With this config No particular configuration for my test

  3. run Execute the helm chart postgresql-ha with default values in order to deploy a redounded postgresql with 1 primary node and 2 slaves: helm install bitnami-redounded oci://registry-1.docker.io/bitnamicharts/postgresql-ha --namespace test-bitnami

  4. Issue After a while, the following artifacts are running but only one postgresql slave node is up and runnning synchronized with the primary node: the second slave seems to fail to connect to the primary since it is automatically restarted by kubelet after a timeout.

kubectl -n test-bitnami get all
NAME                                                               READY   STATUS             RESTARTS       AGE
pod/bitnami-tsdb-redounded-postgresql-ha-pgpool-6464fdf9f6-kd5dd   1/1     Running            0              11m
pod/bitnami-tsdb-redounded-postgresql-ha-postgresql-0              1/1     Running            0              11m
pod/bitnami-tsdb-redounded-postgresql-ha-postgresql-1              0/1     CrashLoopBackOff   5 (101s ago)   11m
pod/bitnami-tsdb-redounded-postgresql-ha-postgresql-2              1/1     Running            0              11m

NAME                                                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/bitnami-tsdb-redounded-postgresql-ha-pgpool                ClusterIP   10.96.133.175   <none>        5432/TCP   11m
service/bitnami-tsdb-redounded-postgresql-ha-postgresql            ClusterIP   10.96.30.24     <none>        5432/TCP   11m
service/bitnami-tsdb-redounded-postgresql-ha-postgresql-headless   ClusterIP   None            <none>        5432/TCP   11m

NAME                                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/bitnami-tsdb-redounded-postgresql-ha-pgpool   1/1     1            1           11m

NAME                                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/bitnami-tsdb-redounded-postgresql-ha-pgpool-6464fdf9f6   1         1         1       11m

NAME                                                               READY   AGE
statefulset.apps/bitnami-tsdb-redounded-postgresql-ha-postgresql   2/3     11m

here below , the logs of the second slave which fails to run. It get stuck waiting for the primary node until a timeout when kubelet restart the container:

kubectl -n test-bitnami logs -f bitnami-tsdb-redounded-postgresql-ha-postgresql-1
postgresql-repmgr 23:17:58.62 INFO  ==> 
postgresql-repmgr 23:17:58.62 INFO  ==> Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 23:17:58.62 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 23:17:58.62 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 23:17:58.62 INFO  ==> 
postgresql-repmgr 23:17:58.63 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 23:17:58.65 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 23:17:58.65 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 23:17:58.65 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 23:18:03.69 INFO  ==> Node configured as standby
postgresql-repmgr 23:18:03.70 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 23:18:03.70 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 23:18:03.76 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 23:18:03.77 INFO  ==> Initializing Repmgr...
postgresql-repmgr 23:18:03.77 INFO  ==> Waiting for primary node...

here below the logs of the slave which successfully started-up :

kubectl -n test-bitnami logs -f bitnami-tsdb-redounded-postgresql-ha-postgresql-2
postgresql-repmgr 23:09:33.76 INFO  ==> 
postgresql-repmgr 23:09:33.76 INFO  ==> Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 23:09:33.77 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 23:09:33.77 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 23:09:33.77 INFO  ==> 
postgresql-repmgr 23:09:33.78 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 23:09:33.79 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 23:09:33.79 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 23:09:33.80 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 23:09:34.09 INFO  ==> Node configured as standby
postgresql-repmgr 23:09:34.09 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 23:09:34.10 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 23:09:34.17 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 23:09:34.18 INFO  ==> Initializing Repmgr...
postgresql-repmgr 23:09:34.18 INFO  ==> Waiting for primary node...
postgresql-repmgr 23:09:54.23 INFO  ==> Rejoining node...
postgresql-repmgr 23:09:54.23 INFO  ==> Cloning data from primary node...
postgresql-repmgr 23:09:55.32 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 23:09:55.33 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 23:09:55.33 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 23:09:55.34 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 23:09:55.35 INFO  ==> Configuring replication parameters
postgresql-repmgr 23:09:55.37 INFO  ==> Configuring fsync
postgresql-repmgr 23:09:55.38 INFO  ==> Setting up streaming replication slave...
postgresql-repmgr 23:09:55.39 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 23:09:56.20 INFO  ==> Unregistering standby node...
postgresql-repmgr 23:09:56.21 INFO  ==> Registering Standby node...
postgresql-repmgr 23:09:56.26 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 23:09:56.37 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **

postgresql-repmgr 23:09:56.38 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2024-02-18 23:09:56.400 GMT [234] LOG:  pgaudit extension initialized
2024-02-18 23:09:56.408 GMT [234] LOG:  redirecting log output to logging collector process
2024-02-18 23:09:56.408 GMT [234] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2024-02-18 23:09:56.408 GMT [234] LOG:  starting PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-02-18 23:09:56.440 GMT [234] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-02-18 23:09:56.462 GMT [234] LOG:  listening on IPv6 address "::", port 5432
2024-02-18 23:09:56.482 GMT [234] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-02-18 23:09:56.491 GMT [238] LOG:  database system was shut down in recovery at 2024-02-18 23:09:56 GMT
2024-02-18 23:09:56.491 GMT [238] LOG:  entering standby mode
2024-02-18 23:09:56.495 GMT [238] LOG:  redo starts at 0/5000028
2024-02-18 23:09:56.496 GMT [238] LOG:  consistent recovery state reached at 0/6000830
2024-02-18 23:09:56.496 GMT [238] LOG:  invalid record length at 0/6000830: expected at least 24, got 0
2024-02-18 23:09:56.496 GMT [234] LOG:  database system is ready to accept read-only connections
2024-02-18 23:09:56.508 GMT [239] LOG:  started streaming WAL from primary at 0/6000000 on timeline 1
 done
server started
postgresql-repmgr 23:09:56.59 INFO  ==> ** Starting repmgrd **
[2024-02-18 23:09:56] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2024-02-18 23:09:56] [NOTICE] starting monitoring of node "bitnami-tsdb-redounded-postgresql-ha-postgresql-2" (ID: 1002)
2024-02-18 23:15:11.614 GMT [236] LOG:  restartpoint starting: time
2024-02-18 23:15:15.813 GMT [236] LOG:  restartpoint complete: wrote 43 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.153 s, sync=0.021 s, total=4.200 s; sync files=14, longest=0.013 s, average=0.002 s; distance=16647 kB, estimate=16647 kB; lsn=0/6041F78, redo lsn=0/6041F40
2024-02-18 23:15:15.813 GMT [236] LOG:  recovery restart point at 0/6041F40
2024-02-18 23:15:15.813 GMT [236] DETAIL:  Last completed transaction was at log time 2024-02-18 23:10:00.192974+00.

And finally, the logs of the primary node. We can see that only the postgresql-2 slave node has connected to the primary:

kubectl -n test-bitnami logs -f bitnami-tsdb-redounded-postgresql-ha-postgresql-0
postgresql-repmgr 23:09:36.79 INFO  ==> 
postgresql-repmgr 23:09:36.79 INFO  ==> Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 23:09:36.79 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 23:09:36.79 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 23:09:36.79 INFO  ==> 
postgresql-repmgr 23:09:36.80 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 23:09:36.82 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 23:09:36.82 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 23:09:36.82 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 23:09:36.87 INFO  ==> There are no nodes with primary role. Assuming the primary role...
postgresql-repmgr 23:09:36.87 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 23:09:36.87 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 23:09:36.93 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 23:09:36.94 INFO  ==> Initializing Repmgr...
postgresql-repmgr 23:09:36.95 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 23:09:36.95 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 23:09:36.95 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 23:09:38.22 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 23:09:38.44 INFO  ==> Changing password of postgres
postgresql-repmgr 23:09:38.45 INFO  ==> Creating replication user repmgr
postgresql-repmgr 23:09:38.47 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 23:09:38.68 INFO  ==> Configuring replication parameters
postgresql-repmgr 23:09:38.70 INFO  ==> Configuring fsync
postgresql-repmgr 23:09:38.70 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 23:09:38.92 INFO  ==> Creating repmgr user: repmgr
postgresql-repmgr 23:09:38.96 INFO  ==> Creating repmgr database: repmgr
postgresql-repmgr 23:09:39.03 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 23:09:39.43 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 23:09:39.64 INFO  ==> Registering Primary...
postgresql-repmgr 23:09:39.75 INFO  ==> Loading custom scripts...
postgresql-repmgr 23:09:39.75 INFO  ==> Configuring synchronous_replication
postgresql-repmgr 23:09:39.76 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 23:09:39.96 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **

postgresql-repmgr 23:09:39.98 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2024-02-18 23:09:39.999 GMT [290] LOG:  pgaudit extension initialized
2024-02-18 23:09:40.010 GMT [290] LOG:  redirecting log output to logging collector process
2024-02-18 23:09:40.010 GMT [290] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2024-02-18 23:09:40.010 GMT [290] LOG:  starting PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-02-18 23:09:40.040 GMT [290] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-02-18 23:09:40.061 GMT [290] LOG:  listening on IPv6 address "::", port 5432
2024-02-18 23:09:40.093 GMT [290] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-02-18 23:09:40.107 GMT [294] LOG:  database system was shut down at 2024-02-18 23:09:39 GMT
2024-02-18 23:09:40.115 GMT [290] LOG:  database system is ready to accept connections
 done
server started
postgresql-repmgr 23:09:40.19 INFO  ==> ** Starting repmgrd **
[2024-02-18 23:09:40] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2024-02-18 23:09:40] [NOTICE] starting monitoring of node "bitnami-tsdb-redounded-postgresql-ha-postgresql-0" (ID: 1000)
[2024-02-18 23:09:40] [NOTICE] monitoring cluster primary "bitnami-tsdb-redounded-postgresql-ha-postgresql-0" (ID: 1000)
2024-02-18 23:09:54.344 GMT [292] LOG:  checkpoint starting: immediate force wait
2024-02-18 23:09:54.412 GMT [292] LOG:  checkpoint complete: wrote 13 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.009 s, sync=0.012 s, total=0.069 s; sync files=8, longest=0.005 s, average=0.002 s; distance=16384 kB, estimate=16384 kB; lsn=0/5000060, redo lsn=0/5000028
[2024-02-18 23:09:58] [NOTICE] new standby "bitnami-tsdb-redounded-postgresql-ha-postgresql-2" (ID: 1002) has connected
2024-02-18 23:14:54.508 GMT [292] LOG:  checkpoint starting: time
2024-02-18 23:14:58.993 GMT [292] LOG:  checkpoint complete: wrote 45 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.431 s, sync=0.014 s, total=4.485 s; sync files=13, longest=0.004 s, average=0.001 s; distance=16647 kB, estimate=16647 kB; lsn=0/6041F78, redo lsn=0/6041F40

Are you using any custom parameters or values?

No custom values, only the default ones provided by the chart

What is the expected behavior?

The expected behavior is to have 2 slaves up and running

What do you see instead?

We only see one the primary node up and running with one slave, but the second one is CrashloopBackoff because it gets stuck to wait for the primary.

Additional information

In case it can help, here below the result of kubectl describe of the 3 pods (the primary one and the two slaves).

The primary node :

kubectl -n test-bitnami describe pod bitnami-tsdb-redounded-postgresql-ha-postgresql-0
Name:             bitnami-tsdb-redounded-postgresql-ha-postgresql-0
Namespace:        test-bitnami
Priority:         0
Service Account:  bitnami-tsdb-redounded-postgresql-ha
Node:             datahub-local-worker2/172.18.0.3
Start Time:       Mon, 19 Feb 2024 00:09:23 +0100
Labels:           app.kubernetes.io/component=postgresql
                  app.kubernetes.io/instance=bitnami-tsdb-redounded
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=postgresql-ha
                  app.kubernetes.io/version=16.2.0
                  controller-revision-hash=bitnami-tsdb-redounded-postgresql-ha-postgresql-b78c9db67
                  helm.sh/chart=postgresql-ha-13.3.3
                  role=data
                  statefulset.kubernetes.io/pod-name=bitnami-tsdb-redounded-postgresql-ha-postgresql-0
Annotations:      <none>
Status:           Running
IP:               10.244.1.75
IPs:
  IP:           10.244.1.75
Controlled By:  StatefulSet/bitnami-tsdb-redounded-postgresql-ha-postgresql
Containers:
  postgresql:
    Container ID:   containerd://1c2e93b0beef08fc9008144962764a486685511fb1adeefbb42447e08b6cf3c3
    Image:          registry-1.docker.io/bitnami/postgresql-repmgr:16.2.0-debian-11-r18
    Image ID:       registry-1.docker.io/bitnami/postgresql-repmgr@sha256:2fbfb8169c474bf00a1f5c56556ad56fc4d7dc6d28350d7fbc94eb48e9cf6128
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 19 Feb 2024 00:09:36 +0100
    Ready:          True
    Restart Count:  0
    Liveness:       exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:      exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                           false
      POSTGRESQL_VOLUME_DIR:                   /bitnami/postgresql
      PGDATA:                                  /bitnami/postgresql/data
      POSTGRES_USER:                           postgres
      POSTGRES_PASSWORD:                       <set to the key 'password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      POSTGRES_DB:                             postgres
      POSTGRESQL_LOG_HOSTNAME:                 true
      POSTGRESQL_LOG_CONNECTIONS:              false
      POSTGRESQL_LOG_DISCONNECTIONS:           false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:          off
      POSTGRESQL_CLIENT_MIN_MESSAGES:          error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:     pgaudit, repmgr
      POSTGRESQL_ENABLE_TLS:                   no
      POSTGRESQL_PORT_NUMBER:                  5432
      REPMGR_PORT_NUMBER:                      5432
      REPMGR_PRIMARY_PORT:                     5432
      MY_POD_NAME:                             bitnami-tsdb-redounded-postgresql-ha-postgresql-0 (v1:metadata.name)
      REPMGR_UPGRADE_EXTENSION:                no
      REPMGR_PGHBA_TRUST_ALL:                  no
      REPMGR_MOUNTED_CONF_DIR:                 /bitnami/repmgr/conf
      REPMGR_NAMESPACE:                        test-bitnami (v1:metadata.namespace)
      REPMGR_PARTNER_NODES:                    bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-1.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-2.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,
      REPMGR_PRIMARY_HOST:                     bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_NAME:                        $(MY_POD_NAME)
      REPMGR_NODE_NETWORK_NAME:                $(MY_POD_NAME).bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_TYPE:                        data
      REPMGR_LOG_LEVEL:                        NOTICE
      REPMGR_CONNECT_TIMEOUT:                  5
      REPMGR_RECONNECT_ATTEMPTS:               2
      REPMGR_RECONNECT_INTERVAL:               3
      REPMGR_USERNAME:                         repmgr
      REPMGR_PASSWORD:                         <set to the key 'repmgr-password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      REPMGR_DATABASE:                         repmgr
      REPMGR_FENCE_OLD_PRIMARY:                no
      REPMGR_CHILD_NODES_CHECK_INTERVAL:       5
      REPMGR_CHILD_NODES_CONNECTED_MIN_COUNT:  1
      REPMGR_CHILD_NODES_DISCONNECT_TIMEOUT:   30
    Mounts:
      /bitnami/postgresql from data (rw)
      /pre-stop.sh from hooks-scripts (rw,path="pre-stop.sh")
      /readiness-probe.sh from hooks-scripts (rw,path="readiness-probe.sh")
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-bitnami-tsdb-redounded-postgresql-ha-postgresql-0
    ReadOnly:   false
  hooks-scripts:
    Type:        ConfigMap (a volume populated by a ConfigMap)
    Name:        bitnami-tsdb-redounded-postgresql-ha-postgresql-hooks-scripts
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

The slave which succeeds to start

kubectl -n test-bitnami describe pod bitnami-tsdb-redounded-postgresql-ha-postgresql-2
Name:             bitnami-tsdb-redounded-postgresql-ha-postgresql-2
Namespace:        test-bitnami
Priority:         0
Service Account:  bitnami-tsdb-redounded-postgresql-ha
Node:             datahub-local-worker3/172.18.0.2
Start Time:       Mon, 19 Feb 2024 00:09:22 +0100
Labels:           app.kubernetes.io/component=postgresql
                  app.kubernetes.io/instance=bitnami-tsdb-redounded
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=postgresql-ha
                  app.kubernetes.io/version=16.2.0
                  controller-revision-hash=bitnami-tsdb-redounded-postgresql-ha-postgresql-b78c9db67
                  helm.sh/chart=postgresql-ha-13.3.3
                  role=data
                  statefulset.kubernetes.io/pod-name=bitnami-tsdb-redounded-postgresql-ha-postgresql-2
Annotations:      <none>
Status:           Running
IP:               10.244.3.59
IPs:
  IP:           10.244.3.59
Controlled By:  StatefulSet/bitnami-tsdb-redounded-postgresql-ha-postgresql
Containers:
  postgresql:
    Container ID:   containerd://85ed5c264b45ca8d3891971f5a660aae7637aad1d65ad43867333bd1bf279079
    Image:          registry-1.docker.io/bitnami/postgresql-repmgr:16.2.0-debian-11-r18
    Image ID:       registry-1.docker.io/bitnami/postgresql-repmgr@sha256:2fbfb8169c474bf00a1f5c56556ad56fc4d7dc6d28350d7fbc94eb48e9cf6128
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 19 Feb 2024 00:09:33 +0100
    Ready:          True
    Restart Count:  0
    Liveness:       exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:      exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                           false
      POSTGRESQL_VOLUME_DIR:                   /bitnami/postgresql
      PGDATA:                                  /bitnami/postgresql/data
      POSTGRES_USER:                           postgres
      POSTGRES_PASSWORD:                       <set to the key 'password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      POSTGRES_DB:                             postgres
      POSTGRESQL_LOG_HOSTNAME:                 true
      POSTGRESQL_LOG_CONNECTIONS:              false
      POSTGRESQL_LOG_DISCONNECTIONS:           false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:          off
      POSTGRESQL_CLIENT_MIN_MESSAGES:          error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:     pgaudit, repmgr
      POSTGRESQL_ENABLE_TLS:                   no
      POSTGRESQL_PORT_NUMBER:                  5432
      REPMGR_PORT_NUMBER:                      5432
      REPMGR_PRIMARY_PORT:                     5432
      MY_POD_NAME:                             bitnami-tsdb-redounded-postgresql-ha-postgresql-2 (v1:metadata.name)
      REPMGR_UPGRADE_EXTENSION:                no
      REPMGR_PGHBA_TRUST_ALL:                  no
      REPMGR_MOUNTED_CONF_DIR:                 /bitnami/repmgr/conf
      REPMGR_NAMESPACE:                        test-bitnami (v1:metadata.namespace)
      REPMGR_PARTNER_NODES:                    bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-1.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-2.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,
      REPMGR_PRIMARY_HOST:                     bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_NAME:                        $(MY_POD_NAME)
      REPMGR_NODE_NETWORK_NAME:                $(MY_POD_NAME).bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_TYPE:                        data
      REPMGR_LOG_LEVEL:                        NOTICE
      REPMGR_CONNECT_TIMEOUT:                  5
      REPMGR_RECONNECT_ATTEMPTS:               2
      REPMGR_RECONNECT_INTERVAL:               3
      REPMGR_USERNAME:                         repmgr
      REPMGR_PASSWORD:                         <set to the key 'repmgr-password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      REPMGR_DATABASE:                         repmgr
      REPMGR_FENCE_OLD_PRIMARY:                no
      REPMGR_CHILD_NODES_CHECK_INTERVAL:       5
      REPMGR_CHILD_NODES_CONNECTED_MIN_COUNT:  1
      REPMGR_CHILD_NODES_DISCONNECT_TIMEOUT:   30
    Mounts:
      /bitnami/postgresql from data (rw)
      /pre-stop.sh from hooks-scripts (rw,path="pre-stop.sh")
      /readiness-probe.sh from hooks-scripts (rw,path="readiness-probe.sh")
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-bitnami-tsdb-redounded-postgresql-ha-postgresql-2
    ReadOnly:   false
  hooks-scripts:
    Type:        ConfigMap (a volume populated by a ConfigMap)
    Name:        bitnami-tsdb-redounded-postgresql-ha-postgresql-hooks-scripts
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

The slave which fails to start (stuck waiting for the primary)

kubectl -n test-bitnami describe pod bitnami-tsdb-redounded-postgresql-ha-postgresql-1
Name:             bitnami-tsdb-redounded-postgresql-ha-postgresql-1
Namespace:        test-bitnami
Priority:         0
Service Account:  bitnami-tsdb-redounded-postgresql-ha
Node:             datahub-local-worker/172.18.0.5
Start Time:       Mon, 19 Feb 2024 00:09:24 +0100
Labels:           app.kubernetes.io/component=postgresql
                  app.kubernetes.io/instance=bitnami-tsdb-redounded
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=postgresql-ha
                  app.kubernetes.io/version=16.2.0
                  controller-revision-hash=bitnami-tsdb-redounded-postgresql-ha-postgresql-b78c9db67
                  helm.sh/chart=postgresql-ha-13.3.3
                  role=data
                  statefulset.kubernetes.io/pod-name=bitnami-tsdb-redounded-postgresql-ha-postgresql-1
Annotations:      <none>
Status:           Running
IP:               10.244.2.48
IPs:
  IP:           10.244.2.48
Controlled By:  StatefulSet/bitnami-tsdb-redounded-postgresql-ha-postgresql
Containers:
  postgresql:
    Container ID:   containerd://437ff1b1ea9229fc0afe75a6433eed499498b287b4486f862d36e652ab7af097
    Image:          registry-1.docker.io/bitnami/postgresql-repmgr:16.2.0-debian-11-r18
    Image ID:       registry-1.docker.io/bitnami/postgresql-repmgr@sha256:2fbfb8169c474bf00a1f5c56556ad56fc4d7dc6d28350d7fbc94eb48e9cf6128
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 19 Feb 2024 02:00:07 +0100
      Finished:     Mon, 19 Feb 2024 02:01:22 +0100
    Ready:          False
    Restart Count:  31
    Liveness:       exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:      exec [bash -ec PGPASSWORD=$POSTGRES_PASSWORD psql -w -U "postgres" -d "postgres" -h 127.0.0.1 -p 5432 -c "SELECT 1"] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                           false
      POSTGRESQL_VOLUME_DIR:                   /bitnami/postgresql
      PGDATA:                                  /bitnami/postgresql/data
      POSTGRES_USER:                           postgres
      POSTGRES_PASSWORD:                       <set to the key 'password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      POSTGRES_DB:                             postgres
      POSTGRESQL_LOG_HOSTNAME:                 true
      POSTGRESQL_LOG_CONNECTIONS:              false
      POSTGRESQL_LOG_DISCONNECTIONS:           false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:          off
      POSTGRESQL_CLIENT_MIN_MESSAGES:          error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:     pgaudit, repmgr
      POSTGRESQL_ENABLE_TLS:                   no
      POSTGRESQL_PORT_NUMBER:                  5432
      REPMGR_PORT_NUMBER:                      5432
      REPMGR_PRIMARY_PORT:                     5432
      MY_POD_NAME:                             bitnami-tsdb-redounded-postgresql-ha-postgresql-1 (v1:metadata.name)
      REPMGR_UPGRADE_EXTENSION:                no
      REPMGR_PGHBA_TRUST_ALL:                  no
      REPMGR_MOUNTED_CONF_DIR:                 /bitnami/repmgr/conf
      REPMGR_NAMESPACE:                        test-bitnami (v1:metadata.namespace)
      REPMGR_PARTNER_NODES:                    bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-1.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,bitnami-tsdb-redounded-postgresql-ha-postgresql-2.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local,
      REPMGR_PRIMARY_HOST:                     bitnami-tsdb-redounded-postgresql-ha-postgresql-0.bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_NAME:                        $(MY_POD_NAME)
      REPMGR_NODE_NETWORK_NAME:                $(MY_POD_NAME).bitnami-tsdb-redounded-postgresql-ha-postgresql-headless.$(REPMGR_NAMESPACE).svc.cluster.local
      REPMGR_NODE_TYPE:                        data
      REPMGR_LOG_LEVEL:                        NOTICE
      REPMGR_CONNECT_TIMEOUT:                  5
      REPMGR_RECONNECT_ATTEMPTS:               2
      REPMGR_RECONNECT_INTERVAL:               3
      REPMGR_USERNAME:                         repmgr
      REPMGR_PASSWORD:                         <set to the key 'repmgr-password' in secret 'bitnami-tsdb-redounded-postgresql-ha-postgresql'>  Optional: false
      REPMGR_DATABASE:                         repmgr
      REPMGR_FENCE_OLD_PRIMARY:                no
      REPMGR_CHILD_NODES_CHECK_INTERVAL:       5
      REPMGR_CHILD_NODES_CONNECTED_MIN_COUNT:  1
      REPMGR_CHILD_NODES_DISCONNECT_TIMEOUT:   30
    Mounts:
      /bitnami/postgresql from data (rw)
      /pre-stop.sh from hooks-scripts (rw,path="pre-stop.sh")
      /readiness-probe.sh from hooks-scripts (rw,path="readiness-probe.sh")
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-bitnami-tsdb-redounded-postgresql-ha-postgresql-1
    ReadOnly:   false
  hooks-scripts:
    Type:        ConfigMap (a volume populated by a ConfigMap)
    Name:        bitnami-tsdb-redounded-postgresql-ha-postgresql-hooks-scripts
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  5m17s (x250 over 115m)  kubelet  Readiness probe failed: psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused
           Is the server running on that host and accepting TCP/IP connections?
  Warning  BackOff  21s (x357 over 111m)  kubelet  Back-off restarting failed container postgresql in pod bitnami-tsdb-redounded-postgresql-ha-postgresql-1_test-bitnami(3714cf1e-bff8-4710-b53a-3e2bb2a40026)

nleeuskadi avatar Feb 19 '24 17:02 nleeuskadi

Hi @nleeuskadi ,

I was not able to reproduce the issue. Could you launch the chart with postgresql.image.debug=true? this may provide more insight on the issue. Also, you could try disabling liveness/readiness probes for this postgres-ha cluster as a workaround, but not the ideal solution though.

dgomezleon avatar Feb 20 '24 11:02 dgomezleon

Hello @dgomezleon , thank you for your help :) i will try what you suggested with postgresql.image.debug=true and get back to you. Cheers.

nleeuskadi avatar Feb 21 '24 13:02 nleeuskadi

Hi Bitnami community,

Since I openned the ticket, I did not reproduce it after reinstalling my K8s cluster. I guess my problem was due to something wrong in my cluster but cannot find out what exactly.

Thank you for your help.

I close the ticket.

Cheers !

nleeuskadi avatar Mar 02 '24 19:03 nleeuskadi