repmgr icon indicating copy to clipboard operation
repmgr copied to clipboard

running cluster show from standby or witness hangs when ip down on primary

Open linkonsupport opened this issue 4 years ago • 1 comments

This is more of a question becouse i don't know if i have missed some important parameter setting in repmgr.conf

we run a script on primary node to fake network down like this #!/bin/sh ip link set eno1 down sleep 60 ip link set eno1 up

during this time it is no problem running command (below) from standby or witness node, it timeout correctly ssh -q -o StrictHostKeyChecking=no -o ConnectTimeout=1

But when i run "cluster show" from standby or witness it just "hangs" until IP is up again on primary it's nothing in the log at standby or witness until ip back again repmgr -f repmgr.conf cluster show

in repmgr.conf (tried different settings with no success on all the nodes involved ) ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=1' #ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=10'

What am i doing wrong? I though that repmgr immediately would detect the problem and take action

linkonsupport avatar Nov 16 '20 11:11 linkonsupport

repmgr cluster show attempts to make database connections to the other node(s), and doesn't use SSH.

Do you have connect_timeout set in your conninfo strings? If this is not present, PostgreSQL's connection will wait until the network connection times out before returning failure, which can be quite a long time depending on your environment.

It might also be worth looking at your network settings, e.g. net.ipv4.tcp_syn_retries.

ibarwick avatar Nov 30 '20 06:11 ibarwick