repmgr icon indicating copy to clipboard operation
repmgr copied to clipboard

Repmgr : It automatically promotes to new master but other standby stopped

Open m-jayson opened this issue 5 years ago • 11 comments

I have an issue which I also posted in stackoverflow https://dba.stackexchange.com/questions/276557/repmgr-it-automatically-promotes-to-new-master-but-other-standby-stopped

However, I would like to understand what happened.

So I was able to test if my automatic failover works and it did. I terminated my primary container so my secondary container got promoted. Unfortunately, my third container stopped here is the log image

I'm running the official postgres docker image v10 and here is my repmgr.conf

NET_IF=`netstat -rn | awk '/^0.0.0.0/ {thif=substr($0,74,10); print thif;} /^default.*UG/ {thif=substr($0,65,10); print thif;}'`
NET_IP=`ifconfig ${NET_IF} | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'` 

HOSTNAME='postgres-'${my_node}

cat<<EOF > /etc/repmgr.conf
	node_id=${my_node}
	node_name=$HOSTNAME
	conninfo='host=${NET_IP} user=repmgr password=repmgr dbname=repmgr connect_timeout=2'
	data_directory='${PGDATA}'

	log_level=INFO
	log_facility=STDERR
	log_status_interval=300
	
	pg_bindir='/usr/lib/postgresql/10/bin'
	use_replication_slots=1
	
	failover=automatic
	promote_command='repmgr standby promote'
	follow_command='repmgr standby follow -W'
EOF

I also tried adding this

#	service_start_command='pg_ctl -D ${PGDATA} start'
#	service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
#	service_reload_command='pg_ctl -D ${PGDATA} reload'
#service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

but same result.

Hope someone could help me on this. Thanks,

m-jayson avatar Oct 05 '20 18:10 m-jayson

At this point we haven't made any particular provision for repmgr to run in Docker, so it's possible there may be issues of one kind or another.

I also tried adding this

#	service_start_command='pg_ctl -D ${PGDATA} start'
#	service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
#	service_reload_command='pg_ctl -D ${PGDATA} reload'
#service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

but same result.

Did you try adding these items without the leading #? I.e.

service_start_command='pg_ctl -D ${PGDATA} start'
service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
service_reload_command='pg_ctl -D ${PGDATA} reload'
service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

By default, when restarting a node for a standby follow operation, repmgr will stop then start the server using pg_ctl, as pg_ctl restart has proven to be problematic in some environments. However the opposite might be the case here. Either way we strongly recommend using the OS level service commands where available to avoid issues like this (not sure if those would be available here).

ibarwick avatar Oct 06 '20 00:10 ibarwick

Also I see from the Stackoverflow post you're using repmgr 5.0; we strongly recommend using repmgr 5.1, the latest version.

ibarwick avatar Oct 06 '20 01:10 ibarwick

@ibarwick yes i tried using without '#'

for the repmgr here is how i download it

RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main 10" \
          >> /etc/apt/sources.list.d/pgdg.list
RUN apt-get update; apt-get install -y postgresql-10-repmgr repmgr-common

Could you please help me? where can i download it?

m-jayson avatar Oct 06 '20 01:10 m-jayson

for the 5.1 version?. I assume the commands would be the same for repmgr it's just the version we are changing

anyway i found it.

RUN curl https://dl.2ndquadrant.com/default/release/get/deb | bash
RUN apt-get update && apt-get install postgresql-11-repmgr repmgr-common -y

i'll try the changes you recommend and get back to you later

m-jayson avatar Oct 06 '20 01:10 m-jayson

@ibarwick it seems that the docker image don't have systemctl command in the image. I also updated the version to 5.1 but still no luck

m-jayson avatar Oct 06 '20 02:10 m-jayson

In that case I'm not sure what can be done. As stated before, we haven't tested this on Docker at all, so it's hard to see what the issue might be. If time permits I'll see if I can reproduce this later in the week, but can't promise anything.

ibarwick avatar Oct 06 '20 05:10 ibarwick

@ibarwick thanks.. how do you start the repmgr btw?

this is how i do it

#!/bin/bash

repmgrd -v 

m-jayson avatar Oct 06 '20 06:10 m-jayson

Aha, if you start it like that, it's probably not daemonizing properly.

Try something like:

repmgrd -f /etc/repmgr.conf --daemonize --pid-file=/tmp/repmgrd.pid >> /tmp/repmgrd.log 2>&1

ibarwick avatar Oct 06 '20 11:10 ibarwick

oh thnx.. i'll give it a try

On Tue, Oct 6, 2020, 7:11 PM Ian Barwick [email protected] wrote:

Aha, if you start it like that, it's probably not daemonizing properly.

Try something like:

repmgrd -f /etc/repmgr.conf --daemonize --pid-file=/tmp/repmgrd.pid >> /tmp/repmgrd.log 2>&1

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/2ndQuadrant/repmgr/issues/667#issuecomment-704197977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBKRMYV6DRASFUK765PHZDSJL3OJANCNFSM4SFA4KUQ .

m-jayson avatar Oct 06 '20 12:10 m-jayson

@ibarwick do we have to stop the pg server whenever we are registering a node as primary or standby?

m-jayson avatar Oct 06 '20 15:10 m-jayson

@ibarwick i think i have fixed it already image

Thanks for your help.

Now I still have another task to do:

  1. is this line i think this is dirty.
repmgrd --verbose >> /tmp/repmgrd.log 2>&1
	tail -f /tmp/repmgrd.log

I have to tail on the log because docker container exists right away

  1. When i put down the 1st node. then put it back again it says still primary if you have an approach on that to make it standby instead since a new primary has already been elected already that would be such a great help for me.

Thanks,

m-jayson avatar Oct 06 '20 19:10 m-jayson