patroni
patroni copied to clipboard
Unexpected exception during Patroni startup
Getting the following during the Patroni startup. Replication is not affected. But node is not a candidate for swithchover.
localhost:5432 - accepting connections
2022-07-22 12:27:06,229 INFO: Lock owner: db-a; I am db-b
2022-07-22 12:27:06,229 INFO: establishing a new patroni connection to the postgres cluster
2022-07-22 12:27:06,317 ERROR: Unexpected exception
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1514, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1478, in _run_cycle
msg = self.process_healthy_cluster()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1070, in process_healthy_cluster
self.state_handler.name, lock_owner), refresh=False)
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 440, in follow
change_required, restart_required = self.state_handler.config.check_recovery_conf(node_to_follow)
File "/usr/lib/python3.6/site-packages/patroni/postgresql/config.py", line 702, in check_recovery_conf
primary_conninfo = params['primary_conninfo']
KeyError: 'primary_conninfo'
2022-07-22 12:27:06,320 INFO: Unexpected exception raised, please report it as a BUG
cp: cannot stat '/postgresqlarch01/5432/wal_archive/0000000F0000000100000050': No such file or directory
2022-07-22 12:27:10.262 +08 [1372396] LOG: started streaming WAL from primary at 1/50000000 on timeline 15
2022-07-22 12:27:16,274 INFO: no action. I am (db-b), a secondary, and following a leader (db-a)
2022-07-22 12:27:26,230 INFO: no action. I am (db-b), a secondary, and following a leader (db-a)
I could locate following in the postgresql.conf
which is generated by Patroni
# recovery.conf
primary_conninfo = 'user=replicator passfile=/postgresql01/software/HA/patroni/tmp/pgpass1 host=xx.xx.xx.xx port=5432 sslmode=prefer application_name=db-b gssencmode=prefer'
primary_slot_name = 'db_b'
recovery_target = ''
recovery_target_lsn = ''
...
Any suggestion on what could be wrong?
Once the DEBUG environment variables are enabled, we have more info.
2022-07-26 11:18:50,625 INFO: Lock owner: db-a; I am db-b
2022-07-26 11:18:50,625 DEBUG: does not have lock
2022-07-26 11:18:50,626 INFO: establishing a new patroni connection to the postgres cluster
2022-07-26 11:18:50,663 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2022-07-26 11:18:50,711 DEBUG: http://172.27.33.67:2379 "POST /v3beta/kv/put HTTP/1.1" 200 115
2022-07-26 11:18:50,712 DEBUG: Unexpected exception
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1514, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1478, in _run_cycle
msg = self.process_healthy_cluster()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1070, in process_healthy_cluster
self.state_handler.name, lock_owner), refresh=False)
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 440, in follow
change_required, restart_required = self.state_handler.config.check_recovery_conf(node_to_follow)
File "/usr/lib/python3.6/site-packages/patroni/postgresql/config.py", line 702, in check_recovery_conf
primary_conninfo = params['primary_conninfo']
KeyError: 'primary_conninfo'
2022-07-26 11:18:50,712 DEBUG: Received message: {'result': {'header': {'cluster_id': '715423625111298638', 'member_id': '7457761695766411049', 'revision': '958', 'raft_term': '344'}, 'events': [{'kv': {'key': 'L3NlcnZpY2UvTU5QLU0xLVBSRC9tZW1iZXJzL2RiLWI=', 'create_revision': '957', 'mod_revision': '958', 'version': '2', 'value': 'eyJjb25uX3VybCI6InBvc3RncmVzOi8vMTcyLjI3LjMzLjY4OjU0MzIvcG9zdGdyZXMiLCJhcGlfdXJsIjoiaHR0cDovLzE3Mi4yNy4zMy42ODo4MDA4L3BhdHJvbmkiLCJzdGF0ZSI6InJ1bm5pbmciLCJyb2xlIjoicmVwbGljYSIsInZlcnNpb24iOiIyLjEuNCIsInBlbmRpbmdfcmVzdGFydCI6dHJ1ZSwieGxvZ19sb2NhdGlvbiI6NjA5MDEyOTQwOCwidGltZWxpbmUiOjE3fQ==', 'lease': '8586537215319964384'}}]}}
2022-07-26 11:18:50,715 INFO: Unexpected exception raised, please report it as a BUG
@CyberDem0n
@jobinau the issue template was created for a purpose. Most of the questions from there are extremely important. From looking at a small chunk of logs it is not possible to tell what is wrong.
I can recommend you to add some debug logs to the check_recovery_conf() method. For example logger.info('params=%s', params)
on the line 702 in order to see what was returned by _read_recovery_params()
Thank you, @CyberDem0n ,
I got access to the same Patroni user's system after a month, and your suggestion to insert logger.info
really helped to understand the code flow and dig deeper. Thanks for that suggestion.
Finally, we could figure out that the user account didn't have the superuser privilege.
superuser:
username: <non-superuser>
which caused all these problems. Now just curious whether it is worth checking whether the user account has proper privilege in the Patroni code would that be a good idea.? I am not sure how frequently other users get into this same trouble.
Now just curious whether it is worth checking whether the user account has proper privilege in the Patroni code would that be a good idea.? I am not sure how frequently other users get into this same trouble.
So far you are the first one. The thing is that Patroni already has full control over the Postgres and $PGDATA. What is the point of removing superuser privileges?