pg_auto_failover icon indicating copy to clipboard operation
pg_auto_failover copied to clipboard

Set the pgautofailover_monitor user's password to NULL.

Open DimCitus opened this issue 3 years ago • 3 comments

We don't actually need to have a password for this user, given that the monitor doesn't authenticate when doing health checks. Not having a password to manage also makes it easier to by compliant with password storage policies of our users (e.g. md5 vs scram-sha-256).

Fixes #763.

DimCitus avatar Jul 09 '21 13:07 DimCitus

I don't think this change is good in the current form. If you have the password set to NULL and set pg_hba.conf to only allow scram-sha-256 (or md5). Then postgres logs on node1 are being flooded with the following FATAL errors (every second a new log): afbeelding

JelteF avatar Aug 19 '21 15:08 JelteF

See also https://github.com/citusdata/pg_auto_failover/pull/672 which has similar errors.

JelteF avatar Aug 19 '21 15:08 JelteF

I just came across this because I was having a similar issue, in our setup script we configure a random password for the pgautofailover_monitor user, in 1.4.2 we were not having issues but now we are seeing our logs full of entries like:

2021-10-11 21:53:35.527 UTC [120300] postgres [unknown] pgautofailover_monitor 10.128.0.3 6164b25f.1d5ec FATAL:  password authentication failed for user "pgautofailover_monitor"
2021-10-11 21:53:35.527 UTC [120300] postgres [unknown] pgautofailover_monitor 10.128.0.3 6164b25f.1d5ec DETAIL:  Connection matched pg_hba.conf line 105: "hostssl all             pgautofailover_monitor        10.0.0.0/8    scram-sha-256"
2021-10-11 21:53:35.529 UTC [120301] postgres [unknown] pgautofailover_monitor 10.128.0.3 6164b25f.1d5ed FATAL:  no pg_hba.conf entry for host "10.128.0.3", user "pgautofailover_monitor", database "postgres", SSL off

After finding this and setting up the password to the hardcoded value of pgautofailover_monitor that error stopped and I saw this in the events:

 2021-10-11 21:53:57.570284+00 |    0/7 |           secondary |           secondary | Node node 7 "node_7" (citus-andres-dev-coord-b-34m8.c.acme-qa01.internal:5432) is marked as healthy by the monitor
 2021-10-11 21:54:23.882783+00 |    0/6 |             primary |             primary | Node node 6 "node_6" (citus-andres-dev-coord-a-z1x0.c.acme-qa01.internal:5432) is marked as healthy by the monitor

As well as seeing the nodes as read-write or read-only.

It would be great if this is better documented somewhere in the docs.

Thanks!

amontalban avatar Oct 11 '21 21:10 amontalban