pg_auto_failover
pg_auto_failover copied to clipboard
Connection reset by peer in log on datanodes
Hello.
I create pg_auto_failover (I've done this several times already and the problem is reproduced every time) and in the process of various tests, I noticed that similar messages appear in the cluster datanode logs almost every minute:
...
2022-08-01 07:47:59.738 MSK [19358()-1] app=[[unknown]],client=[192.168.56.129(34430)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG: could not receive data from client: Connection reset by peer
2022-08-01 07:48:01.740 MSK [19368()-1] app=[[unknown]],client=[192.168.56.129(34480)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG: could not receive data from client: Connection reset by peer
2022-08-01 07:48:47.737 MSK [19544()-1] app=[[unknown]],client=[192.168.56.129(35220)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG: could not receive data from client: Connection reset by peer
2022-08-01 07:48:48.737 MSK [19549()-1] app=[[unknown]],client=[192.168.56.129(35248)] [pgautofailover_monitor@postgres], [vxid:2/0 txid:0] [idle] LOG: could not receive data from client: Connection reset by peer
2022-08-01 07:50:02.744 MSK [19834()-1] app=[[unknown]],client=[192.168.56.129(36412)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG: could not receive data from client: Connection reset by peer
2022-08-01 07:50:03.737 MSK [19839()-1] app=[[unknown]],client=[192.168.56.129(36440)] [pgautofailover_monitor@postgres], [vxid:2/0 txid:0] [idle] LOG: could not receive data from client: Connection reset by peer
...
The pgautofailover_monitor user has CONNECT rights. Connection access is set as trust on all servers in pg_hba.conf and this is the first rule that applies, example:
# Database administrative login by Unix domain socket
# "local" is for Unix domain socket connections only
# TYPE DATABASE USER ADDRESS METHOD
local all postgres peer
local all mamonsu peer
# IPv4 local connections:
host all "pgautofailover_monitor" 192.168.56.129/32 trust # Auto-generated by pg_auto_failover
This monitor server, through the pgautofailover_monitor user, connects to the postgres database on datanodes to check their availability (the documentation says that this is something like ping).
I tried to transfer from trust to md5 (I did not change the pgautofailover_monitor password, because I found out in one of the issue that it was hardcoded) and it did not help.
All functionality is working, automatic switching, switchover, failover, etc. I am worried about these messages, because. this is not normal behavior, the log is filled with these messages, and this slows down our decision to implement pg_auto_failover in a production environment.
Versions of the software I use: CentOS Linux release 7.9.2009 (Core) PostgreSQL 14.4 pg_auto_failover 1.6.4 pg_autoctl version 1.6.4 pg_autoctl extension version 1.6
I would be grateful for your help in solving this problem.