Pg_auto_failover with HAProxy
Hi,
Is it possible to configure the pg_auto_failover with haproxy like if the primary becomes standby and standby becomes primary the pg_auto_failover can communicate its status to haprxoy so haproxy can redirect the write request to primary and read request to standby ?
As far as I know, it's not possible. Because you have to maintain the mointor nodes with another mointor.
Sorry, I did't unstand your question first. It's possible to use keepalived to use virtual ip as read-write interface.
Thank you for the update. You mean to use virtual ip as node ip and configure it in pg auto failover?
You can include some script to detect the primary node and bind the virtual ip on the primary. Find below configuration file I'm using. Frankly I'm using keepalived with patroni right now, though I don't like python for this kind of application.
global_defs {
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script check_postgresql {
script "/usr/bin/su - postgres -c \"pg_controldata /var/lib/pgsql/17/data | grep 'in production'\""
user root
interval 2
fall 2
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 52
nopreempt
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 123456
}
unicast_src_ip 192.168.0.7
unicast_peer {
192.168.0.8
}
virtual_ipaddress {
192.168.0.20/27 dev eth0 label eth0:1
}
garp_master_delay 1
garp_master_refresh 5
track_script {
check_postgresql
}
}
I'm also experimenting with that. What I'm testing right now:
haproxy.cfg
global
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
defaults
mode tcp
log global
retries 3
timeout queue 1m
timeout connect 1s
timeout client 3600s
timeout server 3600s
timeout check 2s
maxconn 500
frontend stats
mode http
bind *:8182
stats enable
stats uri /stats
stats refresh 10s
stats auth admin:password
stats uri /haproxy?stats
frontend db
bind *:5432
mode tcp
default_backend postgres
backend postgres
mode tcp
balance roundrobin
option httpchk
http-check send meth GET uri /health
http-check expect status 200
server db1 primary:5432 check port 5433
server db2 standby:5432 check port 5433 backup
/us/local/bin/pgcheck
#!/bin/bash
#
# This script checks if a PostgreSQL server is healthy running on localhost.
# credits to several9s: https://severalnines.com/blog/postgresql-load-balancing-using-haproxy-keepalived/,
# https://www.percona.com/blog/postgresql-application-connection-failover-using-haproxy-with-xinetd/
export PGDATABASE='postgres'
export PGCONNECT_TIMEOUT=10
STANDBY_CHECK="SELECT pg_is_in_recovery()"
WRITABLE_CHECK="SHOW transaction_read_only"
# Check the status of PostgreSQL
READY=$(pg_isready 2> /dev/null)
if [ $? -ne 0 ]; then
exit 1
fi
# check if in recovery mode (that means it is a 'standby')
STANDBY=$(psql -qt -c "$STANDBY_CHECK" 2>/dev/null)
if [ $? -ne 0 ]; then
exit 1
elif echo $STANDBY | egrep -i "(t|true|on|1)" 2>/dev/null >/dev/null; then
echo -e "Status: 202 Accepted\n\nstandby\n"
exit 0
fi
# check if writable (then we consider it as a 'primary')
READONLY=$(psql -qt -c "$WRITABLE_CHECK" 2>/dev/null)
if [ $? -ne 0 ]; then
exit 1
elif echo $READONLY | egrep -i "(f|false|off|0)" 2>/dev/null >/dev/null; then
echo -e "Status: 200 OK\n\nprimary\n"
exit 0
fi
exit 1
shell2http -no-index -port="$HEALTHCHECK_PORT" -500 -log=/dev/null -cgi /health pgcheck &
What it does: every 10s the url /health is requested. If it's primary it returns 200. For standby it returns HTTP Status 202 and for down 500. Only 200 Status is UP.
It does work OK. However, I think it's better to use multi host URL in the client.
Hello @dantio , could you explain the reasons why you think it's better to use multi host URL in the client (libpq) performances issue ?
Some editors applications may not be able to use libpq multi host connexions. in these cases, a KE ou HAproxy solution could be fine.
I've experimented antother problem with psql && multi host connexions : If the first server of the connexion string is down (network issue, powerOff,...), then it will wait for PGCONNECT_TIMEOUT sec (default is more than 2mins, min is 1s)
# without setting PGCONNECT_TIMEOUT
$ time psql "postgres://vpostgres-test1-ope:8433,vpostgres-test2-ope:8433/postgres?target_session_attrs=read-write&sslmode=require" -U postgres -Aqtc "select pg_read_file('/etc/hostname');"
vpostgres-test2-ope
real 2m9,511s
user 0m0,037s
sys 0m0,004s
# setting PGCONNECT_TIMEOUT to 1s
$ export PGCONNECT_TIMEOUT=1
$ time psql "postgres://vpostgres-test1-ope:8433,vpostgres-test2-ope:8433/postgres?target_session_attrs=read-write&sslmode=require" -U postgres -Aqtc "select pg_read_file('/etc/hostname');"
vpostgres-test2-ope
real 0m1,054s
user 0m0,036s
sys 0m0,004s
# Using connect_timeout in connexion string
$ time psql "postgres://vpostgres-test1-ope:8433,vpostgres-test2-ope:8433/postgres?target_session_attrs=read-write&sslmode=require&connect_timeout=1" -U postgres -Aqtc "select pg_read_file('/etc/hostname');"
vpostgres-test2-ope
real 0m1,056s
user 0m0,031s
sys 0m0,014s
so each in this case (host1 offline), each connection takes 1s (min) to proceed... which is "not really" performant... That's why I look for KE / Hap /.. solutions...
@PatrickBzh With multi host URL you don't have another layer between the App and the DB. Therefore, I was thinking that this solution is better, since lightweight. Since the DB switchover/failover should only seldom happen, 1sec is OK I think...
However, our Clients do not support multi Host URL and we actually never used is so far. We have adjusted the haproxy a litte:
defaults
mode tcp
log global
# retries 3 # retry forever
timeout queue 1m
timeout connect 4s
timeout client 10m
timeout server 10m
timeout check 2s
maxconn 2000
resolvers docker
nameserver dns1 127.0.0.11:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 10s
hold refused 10s
hold nx 10s
hold timeout 10s
hold valid 10s
hold obsolete 10s
frontend stats
mode http
bind *:8182
stats enable
stats refresh 10s
stats auth haproxy:password
stats uri /haproxy?stats
frontend db
bind *:5432
mode tcp
default_backend postgres
backend postgres
mode tcp
balance first
option httpchk GET /health
http-check expect status 200
# 5433 is the health check port
server master db_master:5432 check inter 3s check resolvers docker init-addr libc,none port 5433 on-marked-down shutdown-sessions
server backup db_standby:5432 check inter 3s check resolvers docker init-addr libc,none port 5433 on-marked-down shutdown-sessions
We are happy with this config. Switchover/Failover happens immediately in our Clients (NestJS+Slonik)
We have been running his config in production for a while now:
https://github.com/neuroforgede/pg_auto_failover_ansible/wiki/HAProxy
@s4ke nice!
I have really found some gems in NeuroForge repositories. Thank you for contributing to OS👍🏼