postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

PGO 5.0.5 one pod cluster fails after restart.

Open Subetov opened this issue 3 years ago • 1 comments
trafficstars

Overview

If pod restarted by triggering restart via annotation or by deleting a pod it will not bring alive any more.

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 1.21.4
  • Postgres Version: 13
  • Storage: hostpath (local provisioner)

Steps to Reproduce

  1. Create one replica cluster.
  2. Trigger restart

EXPECTED

Restart is going well

ACTUAL

1/2 containers ready, pod never become alive

Logs

database 2022-04-15 13:54:13,489 INFO: following a different leader because i am not the healthiest node                                                                                                 │
│ database 2022-04-15 13:54:13,494 ERROR: Exception during CHECKPOINT                                                                                                                                      │
│ database Traceback (most recent call last):                                                                                                                                                              │
│ database   File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 600, in checkpoint                                                                                         │
│ database     with get_connection_cursor(**connect_kwargs) as cur:                                                                                                                                        │
│ database   File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__                                                                                                                              │
│ database     return next(self.gen)                                                                                                                                                                       │
│ database   File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 44, in get_connection_cursor                                                                             │
│ database     conn = psycopg.connect(**kwargs)                                                                                                                                                            │
│ database   File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 127, in connect                                                                                                          │
│ database     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)                                                                                                                      │
│ database psycopg2.OperationalError: FATAL:  the database system is starting up                                                                                                                           │
│ database                                                                                                                                                                                                 │
│ database 2022-04-15 13:54:13.819 UTC [567] LOG:  pgaudit extension initialized                                                                                                                           │
│ database 2022-04-15 13:54:13,830 INFO: postmaster pid=567                                                                                                                                                │
│ database 2022-04-15 13:54:13.840 UTC [567] LOG:  redirecting log output to logging collector process                                                                                                     │
│ database 2022-04-15 13:54:13.840 UTC [567] HINT:  Future log output will appear in directory "log".                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database /tmp/postgres:5432 - rejecting connections                                                                                                                                                      │
│ database 2022-04-15 13:54:19,288 INFO: Lock owner: None; I am ice-postgres-dc1-instance1-684f-0                                                                                                          │
│ database 2022-04-15 13:54:19,289 INFO: not healthy enough for leader race                                                                                                                                │
│ database 2022-04-15 13:54:19,289 INFO: changing primary_conninfo and restarting in progress

Subetov avatar Apr 15 '22 14:04 Subetov

same problem here. Any update or solution ?

derlin avatar May 30 '22 12:05 derlin

Same problem here, this should be hotfix.

tungtt1006 avatar Oct 10 '22 11:10 tungtt1006

Hello, I cannot replicate this with PGO 5.0.5 / Postgres 13. I do notice that the postgres image I'm using has patroni 2.1.2 (though on closer inspection, it looks like the Patroni error in your logs is not the cause of your pod failing to come up, but a symptom).

If any of you are still experiencing this problem, could you provide your postgrescluster.yaml and also postgres images (if you're using the RELATED_ env vars in PGO to set the images in the postgres pods)?

benjaminjb avatar Dec 29 '22 17:12 benjaminjb