postgres-operator
postgres-operator copied to clipboard
PGO 5.0.5 one pod cluster fails after restart.
Overview
If pod restarted by triggering restart via annotation or by deleting a pod it will not bring alive any more.
Environment
Please provide the following details:
- Platform: Kubernetes
- Platform Version: 1.21.4
- Postgres Version: 13
- Storage: hostpath (local provisioner)
Steps to Reproduce
- Create one replica cluster.
- Trigger restart
EXPECTED
Restart is going well
ACTUAL
1/2 containers ready, pod never become alive
Logs
database 2022-04-15 13:54:13,489 INFO: following a different leader because i am not the healthiest node │
│ database 2022-04-15 13:54:13,494 ERROR: Exception during CHECKPOINT │
│ database Traceback (most recent call last): │
│ database File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 600, in checkpoint │
│ database with get_connection_cursor(**connect_kwargs) as cur: │
│ database File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__ │
│ database return next(self.gen) │
│ database File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 44, in get_connection_cursor │
│ database conn = psycopg.connect(**kwargs) │
│ database File "/usr/lib64/python3.6/site-packages/psycopg2/__init__.py", line 127, in connect │
│ database conn = _connect(dsn, connection_factory=connection_factory, **kwasync) │
│ database psycopg2.OperationalError: FATAL: the database system is starting up │
│ database │
│ database 2022-04-15 13:54:13.819 UTC [567] LOG: pgaudit extension initialized │
│ database 2022-04-15 13:54:13,830 INFO: postmaster pid=567 │
│ database 2022-04-15 13:54:13.840 UTC [567] LOG: redirecting log output to logging collector process │
│ database 2022-04-15 13:54:13.840 UTC [567] HINT: Future log output will appear in directory "log". │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database /tmp/postgres:5432 - rejecting connections │
│ database 2022-04-15 13:54:19,288 INFO: Lock owner: None; I am ice-postgres-dc1-instance1-684f-0 │
│ database 2022-04-15 13:54:19,289 INFO: not healthy enough for leader race │
│ database 2022-04-15 13:54:19,289 INFO: changing primary_conninfo and restarting in progress
same problem here. Any update or solution ?
Same problem here, this should be hotfix.
Hello, I cannot replicate this with PGO 5.0.5 / Postgres 13. I do notice that the postgres image I'm using has patroni 2.1.2 (though on closer inspection, it looks like the Patroni error in your logs is not the cause of your pod failing to come up, but a symptom).
If any of you are still experiencing this problem, could you provide your postgrescluster.yaml and also postgres images (if you're using the RELATED_ env vars in PGO to set the images in the postgres pods)?