charts [bitnami/postgresql-ha] postgresql-ha-postgresql-0 crashes after a while

[bitnami/postgresql-ha] postgresql-ha-postgresql-0 crashes after a while

Open nightmare-rg opened this issue 4 months ago • 2 comments

Name and Version

bitnami/postgresql-ha 14.2.33

What architecture are you using?

amd64

What steps will reproduce the bug?

Set up K3S Cluster with terraform-hcloud-kube-hetzner with 4 large agent servers

NAME                               STATUS   ROLES                       AGE   VERSION
k3s-agent-large-aln          Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-fsn-hod      Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-fsn-vsw      Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-kdg          Ready    <none>                      47h   v1.30.5+k3s1
k3s-control-plane-fsn1-dfk   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-control-plane-hel1-lzp   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-control-plane-nbg1-zxy   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-egress-aai               Ready    <none>                      47h   v1.30.5+k3s1

create namespace database and install bitnami/postgresql-ha

values.yml

global:
  storageClass: longhorn
  persistence:
    size: 25Gi

postgresql:
  image:
    tag: 14-debian-12
    debug: true
  replicaCount: 3
  maxConnections: 1000
  postgresConnectionLimit: 1000
  dbUserConnectionLimit: 1000

pgpool:
  replicaCount: 3
  maxPool: 20
  numInitChildren: 100
  childLifeTime: 300
  clientIdleLimit: 300
  clientIdleLimitInTransaction: 0
  reservedConnections: 0

I tried with default tag 16.4.0-debian-12-r22 and with the tag 14-debian-12

Install Gitlab Helm Chart with DB Credentials:

values.yml

global:
  psql:
    host: postgresql-postgresql-ha-pgpool.database.svc.cluster.local
    username: postgres
    database: gitlabhq_production
    password:
      secret: psql-password
      key: password

gitlab-runner:
  install: false

postgresql:
  install: false

I removed some other options for better readability.

Gitlab works fine but after some time postgresql-ha-0 node crashes. I remove this node and the cluster recovers, but after some time it crashes again.

What is the expected behavior?

Postgresql Cluster runs without crashes like my other Gitlab Installation with postgresql-ha-13.0.0 and 16.1.0 on v1.28.14+k3s1

What do you see instead?

LAST SEEN   TYPE      REASON      OBJECT                                                 MESSAGE
22m         Normal    Pulled      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Container image "docker.io/bitnami/pgpool:4.5.4-debian-12-r0" already present on machine
22m         Normal    Created     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Created container pgpool
22m         Normal    Started     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Started container pgpool
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: FATAL:  failed to create a backend 0 connection...
22m         Normal    Killing     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Container pgpool failed liveness probe, will be restarted
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:07.82 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: FATAL:  unable to read data from DB node 0...
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:15.62 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:25.48 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:35.62 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:45.52 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: Connection refused...
21m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: server closed the connection unexpectedly...
21m         Normal    Pulled      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Container image "docker.io/bitnami/pgpool:4.5.4-debian-12-r0" already present on machine
21m         Normal    Created     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Created container pgpool
22m         Normal    Started     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Started container pgpool
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Readiness probe failed: command "bash -ec PGPASSWORD=${PGPOOL_POSTGRES_PASSWORD} psql -U \"postgres\" -d \"postgres\" -h /opt/bitnami/pgpool/tmp -tA -c \"SELECT 1\" >/dev/null" timed out
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Readiness probe failed:
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Liveness probe failed:
2m55s       Warning   BackOff     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Back-off restarting failed container pgpool in pod postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886_database(09bba747-0ac4-4e38-9774-999bc1637f0d)
21m         Warning   Failed      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-t44zh   Readiness probe failed: command "bash -ec PGPASSWORD=${PGPOOL_POSTGRES_PASSWORD} psql -U \"postgres\" -d \"postgres\" -h /opt/bitnami/pgpool/tmp -tA -c \"SELECT 1\" >/dev/null" timed out
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-t44zh   Liveness probe failed: command "/opt/bitnami/scripts/pgpool/healthcheck.sh" timed out
21m         Normal    Pulled      pod/postgresql-postgresql-ha-postgresql-0              Container image "docker.io/bitnami/postgresql-repmgr:14-debian-12" already present on machine
21m         Normal    Created     pod/postgresql-postgresql-ha-postgresql-0              Created container postgresql
21m         Normal    Started     pod/postgresql-postgresql-ha-postgresql-0              Started container postgresql
2m59s       Warning   BackOff     pod/postgresql-postgresql-ha-postgresql-0              Back-off restarting failed container postgresql in pod postgresql-postgresql-ha-postgresql-0_database(0e58e555-9ae0-4a0e-a610-548f6833c800)
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-postgresql-0              Readiness probe failed: 127.0.0.1:5432 - rejecting connections
21m         Warning   Unhealthy   pod/postgresql-postgresql-ha-postgresql-0              Readiness probe failed: 127.0.0.1:5432 - no response

Bildschirmfoto 2024-10-09 um 11 56 15

database-postgresql-postgresql-ha-postgresql-1728467925773087000.log

In the log, I don't find any suitable information as to why the master node crashes.

Additional information

On my other Gitlab Setup, I had issues at the beginning problems with the connection limit to Postgresql. So I increased this value to 1000 and the setup ran really well for about 10 months. I used the same values and setup instructions for my new cluster only with the newer version and postgresql-ha is unstable with any load on my Gitlab Instance.

Oct 09 '24 10:10 nightmare-rg

charts charts copied to clipboard

[bitnami/postgresql-ha] postgresql-ha-postgresql-0 crashes after a while

Name and Version

What architecture are you using?

What steps will reproduce the bug?

What is the expected behavior?

What do you see instead?

Additional information

charts
charts copied to clipboard