repmgr icon indicating copy to clipboard operation
repmgr copied to clipboard

the old primary can not rejoin to the cluster

Open Dragonzlx opened this issue 3 years ago • 2 comments

Hi, At first, i built a 3 nodes repmgr cluster in env, dragon01 is the primary and the others is stadnby; then i stopped the pairmay database on dragon01 and promote dragon02 as a new primary database; when i try to execute the command on section 2 to make the dragon01 to rejoin the cluster but failed, could you pls help to explain and let me know how to solve this problem, thanks a lot.

  1. the cluster env

postgresql 12.3 repmgr 5.1 dragon01 old primary dragon02 new primary dragon03 stadnby

  1. the current cluster status

[postgres@dragon03 08:57:38 ~]# repmgr -f repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------- 1 | dragon01 | primary | - failed | ? | dc1 | 100 | | host=dragon01 port=26432 user=repmgr dbname=repmgr connect_timeout=10 2 | dragon02 | primary | * running | | dc1 | 90 | 2 | host=dragon02 port=26432 user=repmgr dbname=repmgr connect_timeout=10 3 | dragon03 | standby | running | dragon02 | dc2 | 80 | 2 | host=dragon03 port=26432 user=repmgr dbname=repmgr connect_timeout=10

  1. the pg_hba.conf setting on dragon02

[postgres@dragon02 09:10:55 /appdata/pgsql/pg12]# tail -5 pg_hba.conf host replication repmgr 192.168.35.0/24 md5 host repmgr repmgr 192.168.35.0/24 md5 host all postgres 192.168.35.0/24 md5

  1. the /home/postgres/.pgpass settings on all three nodes

[postgres@dragon02 09:25:36 ~]# cat .pgpass #hostname:port:database:username:password dragon01:26432:repmgr:repmgr:repmgr dragon01:26432:repmgr:postgres:postgres dragon01:26432:replication:repmgr:repmgr dragon02:26432:repmgr:repmgr:repmgr dragon02:26432:repmgr:postgres:postgres dragon02:26432:replication:repmgr:repmgr dragon03:26432:repmgr:repmgr:repmgr dragon03:26432:repmgr:postgres:postgres dragon03:26432:replication:repmgr:repmgr

  1. the dragon01 is failed to rejoin the cluster even i wrote the password in conninfo

[postgres@dragon01 09:13:35 ~]# repmgr node rejoin -f repmgr.conf -d 'host=dragon02 port=26432 dbname=repmgr user=repmgr' --force-rewind --verbose --dry-run NOTICE: using provided configuration file "repmgr.conf" INFO: replication slots in use, 2 free slots on node 9 ERROR: connection to database failed DETAIL: fe_sendauth: no password supplied

ERROR: unable to establish a replication connection to the rejoin target node

[postgres@dragon01 09:27:02 ~]# repmgr node rejoin -f repmgr.conf -d 'host=dragon02 port=26432 dbname=repmgr user=repmgr password=repmgr' --force-rewind --verbose --dry-run NOTICE: using provided configuration file "repmgr.conf" INFO: replication slots in use, 2 free slots on node 9 ERROR: connection to database failed DETAIL: fe_sendauth: no password supplied

ERROR: unable to establish a replication connection to the rejoin target node

  1. repmgr.conf setting on node dragon01

node_id=1 node_name=dragon01 conninfo='host=dragon01 port=26432 user=repmgr dbname=repmgr connect_timeout=10' pg_bindir='/usr/local/pgsql123/bin' config_directory='/appdata/pgsql/pg12' replication_type=physical location='dc1' priority=100 use_replication_slots=true reconnect_attempts=5 reconnect_interval=10 monitoring_history=yes monitor_interval_secs=3 failover=automatic promote_command='/usr/local/pgsql123/bin/repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file' follow_command='/usr/local/pgsql123/bin/repmgr standby follow -f /home/postgres/repmgr.conf --log-to-file --upstream-node-id=%n' data_directory='/appdata/pgsql/pg12' ssh_options='-q -o ConnectTimeout=10' log_file='/tmp/repmgr.log' log_level=info log_facility=STDERR log_status_interval=300

  1. i tryed to use strace the rejoin command, the .pgpass can be detected, so i think it can use .pgpass to connect the current primary database dragon02

stat("/home/postgres/.pgpass", {st_mode=S_IFREG|0600, st_size=399, ...}) = 0 open("/home/postgres/.pgpass", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0600, st_size=399, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ae6ee2ae000 read(3, "#hostname:port:database:username"..., 4096) = 399 close(3) = 0 munmap(0x2ae6ee2ae000, 4096) = 0 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)

  1. the repmgr can connect to dragon02 from draon01 by psql

[postgres@dragon01 09:14:16 ~]# psql -U repmgr -h dragon02 -p 26432 -d repmgr psql (12.3) Type "help" for help.

repmgr=# \q [postgres@dragon01 09:14:32 ~]#

Dragonzlx avatar Aug 16 '20 01:08 Dragonzlx

Can you provide the output of:

SELECT * FROM repmgr.nodes (in database repmgr)?

Thanks.

ibarwick avatar Aug 18 '20 08:08 ibarwick

Hi, the output is on below, there is a workgroud to fix it by temporary, we export PGPASSWORD, but it is not a good method.

postgres=# \c repmgr repmgr Password for user repmgr: You are now connected to database "repmgr" as user "repmgr". repmgr=# repmgr=# SELECT * FROM repmgr.nodes repmgr-# ; node_id | upstream_node_id | active | node_name | type | location | priority | conninfo | repluser | slot_name |
config_file
---------+------------------+--------+-----------+---------+----------+----------+-----------------------------------------------------------------------+----------+---------------+-----

   1 |                  | t      | dragon01  | primary | dc1      |      100 | host=dragon01 port=26432 user=repmgr dbname=repmgr connect_timeout=10 | repmgr   | repmgr_slot_1 | /hom

e/postgres/repmgr.conf 3 | 1 | t | dragon03 | standby | dc2 | 80 | host=dragon03 port=26432 user=repmgr dbname=repmgr connect_timeout=10 | repmgr | repmgr_slot_3 | /hom e/postgres/repmgr.conf 2 | 1 | f | dragon02 | standby | dc1 | 90 | host=dragon02 port=26432 user=repmgr dbname=repmgr connect_timeout=10 | repmgr | repmgr_slot_2 | /hom e/postgres/repmgr.conf (3 rows)

Dragonzlx avatar Aug 18 '20 08:08 Dragonzlx