repmgr icon indicating copy to clipboard operation
repmgr copied to clipboard

Allow repmgr to identify the current Primary node

Open gclough opened this issue 7 years ago • 7 comments

At the moment we can run repmgr cluster show, to list all nodes in the cluster:

$ repmgr cluster show
 ID | Name                    | Role    | Status    | Upstream                | Location | Connection string
----+-------------------------+---------+-----------+-------------------------+----------+-------------------------------------------------------
 1  | postgres01.local.domain | primary | * running |                         | dc1      | host=postgres01.local.domain user=repmgr dbname=repmgr
 2  | postgres02.local.domain | standby |   running | postgres01.local.domain | dc1      | host=postgres02.local.domain user=repmgr dbname=repmgr
 3  | postgres03.local.domain | standby |   running | postgres01.local.domain | dc1      | host=postgres03.local.domain user=repmgr dbname=repmgr
 4  | postgres04.local.domain | standby |   running | postgres01.local.domain | dc2      | host=postgres04.local.domain user=repmgr dbname=repmgr
 5  | postgres04.local.domain | standby |   running | postgres01.local.domain | dc2      | host=postgres05.local.domain user=repmgr dbname=repmgr

But what I'd like is some reliable way for repmgr to simply return the hostname (or maybe the connect string) of the Primary host.

I can probably get something similar with:

repmgr cluster show --csv | while read LINE; do
  NODE_ID="`echo $LINE | cut -f1 -d,`"
  AVAILABILITY="`echo $LINE | cut -f2 -d,`"
  RECOVERY_STATE="`echo $LINE | cut -f3 -d,`"
  if [ "$AVAILABILITY" -eq 0 -a "$RECOVERY_STATE" -eq 0 ]; then
    repmgr cluster show | grep "^ $NODE_ID "
  fi
done | cut -f2 -d\| | sed s/' '//g

... but that seems a little messy. Ideally, repmgr is the single source of truth, so I'd like to run a simple test to see who is the Primary.

gclough avatar Jul 02 '18 21:07 gclough

Makes sense, IIRC it's been requested before but fell off the back burner at some point.

I'm thinking of something like: repmgr cluster show --primary to display just records for nodes detected as primary. --csv to output the basics in the same format as now, and --csv --verbose to output all the information shown in normal mode.

As it's conceivable there might be more than one active primary, or no active primary at all, error codes like ERR_SPLIT_BRAIN or ERR_NO_PRIMARY might be a good idea.

ibarwick avatar Jul 12 '18 08:07 ibarwick

That all seems very sensible. I presume that the ERR_* status will result in a non-zero returncode?

Also, possibly, would be make sense to have a special case where "this host" is the current (and only) Primary? It may be nice to have an option of running things only on the current Primary. What if a new option isprimary is added, along with a -q, --quiet option to suppress output:

repmgr cluster isprimary -q && /some/script/for/PRIMARY_ONLY.sh

or maybe also include isstandby, to test for a valid & active standby too:

repmgr cluster isstandby -q && /some/script/for/STANDBY_ONLY.sh

Just ideas...

gclough avatar Jul 12 '18 08:07 gclough

I have the same question. What is the proper way to determine the current primary node using (some combination of?) node status, node check, or cluster show command(s)? It looks like one way is to call cluster show and find the node whose Role="primary" and Status="* running". Is this correct, or is gclough's script (in his original question) correct, or is there an easier way?

ckolovson avatar Feb 11 '19 21:02 ckolovson

Right now the best option is doing something with the output of repmgr cluster show --csv.

Note that if you have more than one node in recovery, this won't tell you which node is the actual current primary.

This feature has not been forgotten and will be implemented as part of a wider set of upcoming improvements.

ibarwick avatar Feb 12 '19 08:02 ibarwick

The problem with this is that you cannot trust the database in case failover was partially successful. For instance, you have promoted the slave successfully but the old master is still running but is not replicated anymore. Now your repmgr database is out of sync. One node will represent the correct state, the other will have the state from before the role switch. So this question is already popping up the moment you decide to use a query and you have to ask: what node will I run this on.

I've tried to wrap my head around this and I find no answer that will suit all cases. The way I solved this over here is to use haproxy to determine the correct node by running queries to check if a node is replicating or not. In case it find that 2 servers are in master mode, it will cut the clients because it cannot be determined who the correct one is.

I've written an haproxy check to help with this. see https://github.com/gplv2/haproxy-postgresql

gplv2 avatar Feb 18 '19 09:02 gplv2

Is this going to be implemented in the future ?

gjrtimmer avatar Apr 18 '23 10:04 gjrtimmer

At the moment we can run repmgr cluster show, to list all nodes in the cluster:

$ repmgr cluster show
 ID | Name                    | Role    | Status    | Upstream                | Location | Connection string
----+-------------------------+---------+-----------+-------------------------+----------+-------------------------------------------------------
 1  | postgres01.local.domain | primary | * running |                         | dc1      | host=postgres01.local.domain user=repmgr dbname=repmgr
 2  | postgres02.local.domain | standby |   running | postgres01.local.domain | dc1      | host=postgres02.local.domain user=repmgr dbname=repmgr
 3  | postgres03.local.domain | standby |   running | postgres01.local.domain | dc1      | host=postgres03.local.domain user=repmgr dbname=repmgr
 4  | postgres04.local.domain | standby |   running | postgres01.local.domain | dc2      | host=postgres04.local.domain user=repmgr dbname=repmgr
 5  | postgres04.local.domain | standby |   running | postgres01.local.domain | dc2      | host=postgres05.local.domain user=repmgr dbname=repmgr

But what I'd like is some reliable way for repmgr to simply return the hostname (or maybe the connect string) of the Primary host.

I can probably get something similar with:

repmgr cluster show --csv | while read LINE; do
  NODE_ID="`echo $LINE | cut -f1 -d,`"
  AVAILABILITY="`echo $LINE | cut -f2 -d,`"
  RECOVERY_STATE="`echo $LINE | cut -f3 -d,`"
  if [ "$AVAILABILITY" -eq 0 -a "$RECOVERY_STATE" -eq 0 ]; then
    repmgr cluster show | grep "^ $NODE_ID "
  fi
done | cut -f2 -d\| | sed s/' '//g

... but that seems a little messy. Ideally, repmgr is the single source of truth, so I'd like to run a simple test to see who is the Primary.

provided script does not work when there is a witness server

gjrtimmer avatar Apr 18 '23 10:04 gjrtimmer