pg_auto_failover icon indicating copy to clipboard operation
pg_auto_failover copied to clipboard

Connection reset by peer in log on datanodes

Open xinferum opened this issue 1 year ago • 0 comments

Hello.

I create pg_auto_failover (I've done this several times already and the problem is reproduced every time) and in the process of various tests, I noticed that similar messages appear in the cluster datanode logs almost every minute:

...
2022-08-01 07:47:59.738 MSK [19358()-1] app=[[unknown]],client=[192.168.56.129(34430)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG:  could not receive data from client: Connection reset by peer
2022-08-01 07:48:01.740 MSK [19368()-1] app=[[unknown]],client=[192.168.56.129(34480)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG:  could not receive data from client: Connection reset by peer
2022-08-01 07:48:47.737 MSK [19544()-1] app=[[unknown]],client=[192.168.56.129(35220)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG:  could not receive data from client: Connection reset by peer
2022-08-01 07:48:48.737 MSK [19549()-1] app=[[unknown]],client=[192.168.56.129(35248)] [pgautofailover_monitor@postgres], [vxid:2/0 txid:0] [idle] LOG:  could not receive data from client: Connection reset by peer
2022-08-01 07:50:02.744 MSK [19834()-1] app=[[unknown]],client=[192.168.56.129(36412)] [[unknown]@[unknown]], [vxid: txid:0] [] LOG:  could not receive data from client: Connection reset by peer
2022-08-01 07:50:03.737 MSK [19839()-1] app=[[unknown]],client=[192.168.56.129(36440)] [pgautofailover_monitor@postgres], [vxid:2/0 txid:0] [idle] LOG:  could not receive data from client: Connection reset by peer
...

The pgautofailover_monitor user has CONNECT rights. Connection access is set as trust on all servers in pg_hba.conf and this is the first rule that applies, example:

# Database administrative login by Unix domain socket
# "local" is for Unix domain socket connections only

# TYPE DATABASE USER ADDRESS METHOD
local all postgres peer
local all mamonsu peer

# IPv4 local connections:
host all "pgautofailover_monitor" 192.168.56.129/32 trust # Auto-generated by pg_auto_failover

This monitor server, through the pgautofailover_monitor user, connects to the postgres database on datanodes to check their availability (the documentation says that this is something like ping).

I tried to transfer from trust to md5 (I did not change the pgautofailover_monitor password, because I found out in one of the issue that it was hardcoded) and it did not help.

All functionality is working, automatic switching, switchover, failover, etc. I am worried about these messages, because. this is not normal behavior, the log is filled with these messages, and this slows down our decision to implement pg_auto_failover in a production environment.

Versions of the software I use: CentOS Linux release 7.9.2009 (Core) PostgreSQL 14.4 pg_auto_failover 1.6.4 pg_autoctl version 1.6.4 pg_autoctl extension version 1.6

I would be grateful for your help in solving this problem.

xinferum avatar Aug 01 '22 07:08 xinferum