repmgr
repmgr copied to clipboard
Following the documentation but ended up with a different result follow-new-primary
Hello,
Just trying to learn about repmgr and was at /promoting-standby.html but got error on follow-new-primary.xml
Version: repmgr 5.3.2
OS
postgres@bd3:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
configuration file: Note: id and name change following node.
node_id=1
node_name='bd1'
conninfo='host=bd1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/14/main'
pg_basebackup_options=''
ssh_options='-q -o ConnectTimeout=10'
service_start_command='sudo pg_ctlcluster 14 main start'
service_stop_command='sudo pg_ctlcluster 14 main stop'
service_restart_command= 'sudo pg_ctlcluster 14 main restart'
service_reload_command='sudo pg_ctlcluster 14 main reload'
From https://repmgr.org/docs/current/promoting-standby.html => OK
postgres@bd1:~$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
1 | bd1 | primary | * running | | default | 100 | 1 | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
2 | bd2 | standby | running | bd1 | default | 100 | 1 | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
3 | bd3 | standby | running | bd1 | default | 100 | 1 | host=bd3 user=repmgr dbname=repmgr connect_timeout=2
On primary:
pg_ctlcluster 14 main stop
On first standby bd2:
postgres@bd2:~$ repmgr standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
bd3 (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "bd2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "bd2" (ID: 2) was successfully promoted to primary
Then status are:
postgres@bd2:~$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
1 | bd1 | primary | - failed | ? | default | 100 | | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
2 | bd2 | primary | * running | | default | 100 | 2 | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
3 | bd3 | standby | running | ? bd1 | default | 100 | 1 | host=bd3 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "bd1" (ID: 1)
- unable to connect to node "bd3" (ID: 3)'s upstream node "bd1" (ID: 1)
- unable to determine if node "bd3" (ID: 3) is attached to its upstream node "bd1" (ID: 1)
HINT: execute with --verbose option to see connection error messages
On second standby bd3 needs to follow bd2:
postgres@bd3:~$ repmgr standby follow
NOTICE: attempting to find and follow current primary
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/40000A0; follow target node's fork point: 0/40000A0
NOTICE: setting node 3's upstream to node 2
WARNING: node "bd3" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "bd2" (ID: 2)
The documentation does not mention anything about the replication slot missing: https://github.com/EnterpriseDB/repmgr/blob/master/doc/follow-new-primary.xml#L18
Then status are:
postgres@bd2:~$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
1 | bd1 | primary | - failed | ? | default | 100 | | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
2 | bd2 | primary | * running | | default | 100 | 2 | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
3 | bd3 | standby | running | bd2 | default | 100 | 1 | host=bd3 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "bd1" (ID: 1)
HINT: execute with --verbose option to see connection error messages
Do you have an idea about what needs to be done to fixe the issue ?
I created the bd3 replication slot but it did nothing.
Robin,