repmgr
repmgr copied to clipboard
running cluster show from standby or witness hangs when ip down on primary
This is more of a question becouse i don't know if i have missed some important parameter setting in repmgr.conf
we run a script on primary node to fake network down like this #!/bin/sh ip link set eno1 down sleep 60 ip link set eno1 up
during this time it is no problem running command (below) from standby or witness node, it timeout correctly
ssh -q -o StrictHostKeyChecking=no -o ConnectTimeout=1
But when i run "cluster show" from standby or witness it just "hangs" until IP is up again on primary it's nothing in the log at standby or witness until ip back again repmgr -f repmgr.conf cluster show
in repmgr.conf (tried different settings with no success on all the nodes involved ) ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=1' #ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=10'
What am i doing wrong? I though that repmgr immediately would detect the problem and take action
repmgr cluster show
attempts to make database connections to the other node(s), and doesn't use SSH.
Do you have connect_timeout
set in your conninfo
strings? If this is not present, PostgreSQL's connection will wait until the network connection times out before returning failure, which can be quite a long time depending on your environment.
It might also be worth looking at your network settings, e.g. net.ipv4.tcp_syn_retries
.