[bitnami/postgresql-repmgr] Witness implementation missing | Prevent split brain scenarios
Name and Version
bitnami/postgresql-repmgr:all
What is the problem this feature will solve?
The PostgreSQL replication manager RepMgr has a feature called witness (see witness server) which should prevent the database from a split-brain scenario when a failure/down of a primary node occurs.
What is the feature you are proposing to solve the problem?
Your implementation of the bitnami/postgresql-repmgr is completely missing the witness (server) feature. It cannot be configured with your docker image. It's not even mentioned anywhere in the source. I think most of the people use your postgresql-repmgr image for HA scenarios and the default way of handling outages/failures is not fulfilled. See the article https://www.enterprisedb.com/blog/pg-phriday-isolating-postgres-repmgr its a pretty good explanation if your are not familiar with that topic. Probably we could look into auto-fencing would be another topic when the witness feature is implemented.
What alternatives have you considered?
I am waiting for MR https://github.com/bitnami/containers/pull/1492, but this seems more like a hack to me
Hi,
Thank you so much for the feature request! I think it makes sense to add support to witness servers in the container and chart. I will open a task for that. If you want to speed up the process, feel free to submit a PR and we will review it :D
It looks like there is an attempt to add this feature https://github.com/bitnami/containers/pull/20558/files
Hi, we have made a new release that includes this functionality. Also a new postgresql-ha chart release has been made that supports creating witness nodes. Could you take a look to them ?
I'll check it out soon, and give some feedback. Thanks!
I have trouble setting it up. The container shuts itself down. Here is my compose:
version: '3.8'
services:
pg-0:
image: bitnami/postgresql-repmgr:latest
ports:
- 6430:5432
volumes:
- /docker/local/database_repmgr2/pg-0:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- POSTGRESQL_USERNAME=customuser
- POSTGRESQL_PASSWORD=custompassword
- POSTGRESQL_DATABASE=customdatabase
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432
- REPMGR_NODE_NAME=pg-0
- REPMGR_NODE_NETWORK_NAME=pg-0
- REPMGR_PORT_NUMBER=5432
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
constraints:
- node.role == worker
- node.labels.epyc1 == true
pg-1:
image: bitnami/postgresql-repmgr:latest
ports:
- 6431:5432
volumes:
- /docker/local/database_repmgr2/pg-1:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- POSTGRESQL_USERNAME=customuser
- POSTGRESQL_PASSWORD=custompassword
- POSTGRESQL_DATABASE=customdatabase
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432
- REPMGR_NODE_NAME=pg-1
- REPMGR_NODE_NETWORK_NAME=pg-1
- REPMGR_PORT_NUMBER=5432
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
constraints:
- node.role == worker
- node.labels.epyc2 == true
pgw-0:
image: bitnami/postgresql-repmgr:latest
ports:
- 6439:5432
volumes:
- /docker/local/database_repmgr2/pgw-0:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pgw-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432
- REPMGR_NODE_NAME=pgw-0
- REPMGR_NODE_NETWORK_NAME=pgw-0
- REPMGR_PORT_NUMBER=5432
- REPMGR_NODE_TYPE=witness
- BITNAMI_DEBUG=true
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
constraints:
- node.role == worker
- node.labels.epyc2 == true
Then the container fails with on the first time (no database exists) with:
Success. You can now start the database server using:
/opt/bitnami/postgresql/bin/pg_ctl -D /bitnami/postgresql/data -l logfile start
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
postgresql-repmgr 15:07:12.39 INFO ==> Starting PostgreSQL in background...
waiting for server to start....2023-01-27 15:07:12.432 GMT [165] LOG: pgaudit extension initialized
2023-01-27 15:07:12.443 GMT [165] LOG: redirecting log output to logging collector process
2023-01-27 15:07:12.443 GMT [165] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:07:12.443 GMT [165] LOG: starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:07:12.444 GMT [165] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-01-27 15:07:12.444 GMT [165] LOG: could not bind IPv6 address "::1": Cannot assign requested address
2023-01-27 15:07:12.450 GMT [165] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:07:12.458 GMT [169] LOG: database system was shut down at 2023-01-27 15:07:12 GMT
2023-01-27 15:07:12.467 GMT [165] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
postgresql-repmgr 15:07:12.61 INFO ==> Changing password of postgres
ALTER ROLE
postgresql-repmgr 15:07:12.64 INFO ==> Creating user customuser
CREATE ROLE
postgresql-repmgr 15:07:12.67 INFO ==> Granting access to "customuser" to the database "customdatabase"
GRANT
ALTER DATABASE
postgresql-repmgr 15:07:12.71 INFO ==> Setting ownership for the 'public' schema database "customdatabase" to "customuser"
ALTER SCHEMA
postgresql-repmgr 15:07:12.74 INFO ==> Creating replication user repmgr
CREATE ROLE
postgresql-repmgr 15:07:12.78 INFO ==> Stopping PostgreSQL...
waiting for server to shut down....2023-01-27 15:07:12.790 GMT [165] LOG: received fast shutdown request
2023-01-27 15:07:12.793 GMT [165] LOG: aborting any active transactions
2023-01-27 15:07:12.796 GMT [165] LOG: background worker "logical replication launcher" (PID 173) exited with exit code 1
2023-01-27 15:07:12.796 GMT [167] LOG: shutting down
2023-01-27 15:07:12.821 GMT [167] LOG: checkpoint starting: shutdown immediate
2023-01-27 15:07:12.928 GMT [167] LOG: checkpoint complete: wrote 927 buffers (5.7%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.025 s, sync=0.018 s, total=0.110 s; sync files=257, longest=0.004 s, average=0.001 s; distance=11273 kB, estimate=11273 kB
2023-01-27 15:07:12.940 GMT [165] LOG: database system is shut down
done
server stopped
postgresql-repmgr 15:07:13.02 INFO ==> Configuring replication parameters
postgresql-repmgr 15:07:13.07 INFO ==> Configuring fsync
postgresql-repmgr 15:07:13.09 INFO ==> Starting PostgreSQL in background...
waiting for server to start....2023-01-27 15:07:13.134 GMT [244] LOG: pgaudit extension initialized
2023-01-27 15:07:13.146 GMT [244] LOG: redirecting log output to logging collector process
2023-01-27 15:07:13.146 GMT [244] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:07:13.146 GMT [244] LOG: starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:07:13.146 GMT [244] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-01-27 15:07:13.147 GMT [244] LOG: listening on IPv6 address "::", port 5432
2023-01-27 15:07:13.152 GMT [244] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:07:13.161 GMT [248] LOG: database system was shut down at 2023-01-27 15:07:12 GMT
2023-01-27 15:07:13.169 GMT [244] LOG: database system is ready to accept connections
done
server started
postgresql-repmgr 15:07:13.23 INFO ==> Creating repmgr user: repmgr
ERROR: role "repmgr" already exists
2023-01-27 15:07:13.262 GMT [259] ERROR: role "repmgr" already exists
2023-01-27 15:07:13.262 GMT [259] STATEMENT: CREATE ROLE "repmgr" WITH LOGIN CREATEDB PASSWORD 'repmgrpassword';
ALTER ROLE
ALTER ROLE
postgresql-repmgr 15:07:13.32 INFO ==> Creating repmgr database: repmgr
CREATE DATABASE
postgresql-repmgr 15:07:13.40 INFO ==> Unregistering witness node...
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
ERROR: _get_primary_connection(): unable to retrieve node records
DETAIL:
ERROR: relation "repmgr.nodes" does not exist
LINE 1: ...imary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nod...
^
DETAIL: query text is:
SELECT node_id, conninfo, CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nodes WHERE active IS TRUE AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
ERROR: unable to connect to primary
DETAIL:
connection pointer is NULL
2023-01-27 15:07:13.427 GMT [275] ERROR: relation "repmgr.nodes" does not exist at character 108
2023-01-27 15:07:13.427 GMT [275] STATEMENT: SELECT node_id, conninfo, CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nodes WHERE active IS TRUE AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
postgresql-repmgr 15:07:13.43 INFO ==> Registering witness node...
postgresql-repmgr 15:07:13.43 INFO ==> Waiting for primary node...
postgresql-repmgr 15:07:13.44 DEBUG ==> Wait for schema repmgr.repmgr on 'pg-0:5432', will try 6 times with 10 delay seconds (TIMEOUT=60)
postgresql-repmgr 15:07:13.47 DEBUG ==> Schema repmgr.repmgr exists!
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
[REPMGR EVENT] Node id: 1000; Event type: cluster_created; Success [1|0]: 1; Time: 2023-01-27 15:07:13.577239+00; Details:
Looking for the script: /opt/bitnami/repmgr/events/execs/cluster_created.sh
[REPMGR EVENT] no script '/opt/bitnami/repmgr/events/execs/cluster_created.sh' found. Skipping...
ERROR: node "pgw-0" (ID: 1000) is already registered as a primary node
HINT: use "repmgr primary unregister" to remove a non-witness node record
postgresql-repmgr 15:07:13.61 INFO ==> Stopping PostgreSQL...
waiting for server to shut down....2023-01-27 15:07:13.626 GMT [244] LOG: received fast shutdown request
2023-01-27 15:07:13.628 GMT [244] LOG: aborting any active transactions
2023-01-27 15:07:13.630 GMT [244] LOG: background worker "logical replication launcher" (PID 252) exited with exit code 1
2023-01-27 15:07:13.631 GMT [246] LOG: shutting down
2023-01-27 15:07:13.670 GMT [246] LOG: checkpoint starting: shutdown immediate
2023-01-27 15:07:13.722 GMT [246] LOG: checkpoint complete: wrote 939 buffers (5.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=0.018 s, total=0.055 s; sync files=259, longest=0.004 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
2023-01-27 15:07:13.734 GMT [244] LOG: database system is shut down
done
server stopped
The other times it fails with:
postgresql-repmgr 15:09:54.19
postgresql-repmgr 15:09:54.19 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 15:09:54.19 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 15:09:54.19 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 15:09:54.20
postgresql-repmgr 15:09:54.21 DEBUG ==> Configuring libnss_wrapper...
postgresql-repmgr 15:09:54.23 INFO ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 15:09:54.26 INFO ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 15:09:54.27 INFO ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 15:09:54.28 INFO ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 15:09:54.29 DEBUG ==> Checking node 'pg-0:5432'...
postgresql-repmgr 15:09:54.35 DEBUG ==> Pretending primary role node - 'pg-0:5432'
postgresql-repmgr 15:09:54.35 DEBUG ==> Pretending primary set to 'pg-0:5432'!
postgresql-repmgr 15:09:54.36 DEBUG ==> Checking node 'pg-1:5432'...
postgresql-repmgr 15:09:54.42 DEBUG ==> Pretending primary role node - 'pg-0:5432'
postgresql-repmgr 15:09:54.42 INFO ==> Auto-detected primary node: 'pg-0:5432'
postgresql-repmgr 15:09:54.42 DEBUG ==> Primary node: 'pg-0:5432'
postgresql-repmgr 15:09:54.43 INFO ==> Node configured as witness
postgresql-repmgr 15:09:54.44 INFO ==> Preparing PostgreSQL configuration...
postgresql-repmgr 15:09:54.45 DEBUG ==> Injecting a new postgresql.conf file...
postgresql-repmgr 15:09:54.45 INFO ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 15:09:54.58 DEBUG ==> Injecting a new pg_hba.conf file...
postgresql-repmgr 15:09:54.59 INFO ==> Preparing repmgr configuration...
postgresql-repmgr 15:09:54.61 DEBUG ==> Node ID: '1000', Rol: 'witness', Primary Node: 'pg-0:5432'
postgresql-repmgr 15:09:54.61 INFO ==> Initializing Repmgr...
postgresql-repmgr 15:09:54.63 INFO ==> Initializing PostgreSQL database...
postgresql-repmgr 15:09:54.63 DEBUG ==> Copying files from /bitnami/postgresql/conf to /opt/bitnami/postgresql/conf
postgresql-repmgr 15:09:54.64 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 15:09:54.64 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 15:09:54.65 DEBUG ==> Ensuring expected directories/files exist...
postgresql-repmgr 15:09:54.70 INFO ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 15:09:54.73 INFO ==> Configuring replication parameters
postgresql-repmgr 15:09:54.78 INFO ==> Configuring fsync
postgresql-repmgr 15:09:54.80 INFO ==> Starting PostgreSQL in background...
waiting for server to start....2023-01-27 15:09:54.855 GMT [176] LOG: pgaudit extension initialized
2023-01-27 15:09:54.870 GMT [176] LOG: redirecting log output to logging collector process
2023-01-27 15:09:54.870 GMT [176] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:09:54.870 GMT [176] LOG: starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:09:54.871 GMT [176] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-01-27 15:09:54.871 GMT [176] LOG: listening on IPv6 address "::", port 5432
2023-01-27 15:09:54.876 GMT [176] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:09:54.883 GMT [180] LOG: database system was shut down at 2023-01-27 15:09:46 GMT
2023-01-27 15:09:54.894 GMT [176] LOG: database system is ready to accept connections
done
server started
postgresql-repmgr 15:09:54.95 INFO ==> Creating repmgr user: repmgr
2023-01-27 15:09:54.989 GMT [191] ERROR: role "repmgr" already exists
2023-01-27 15:09:54.989 GMT [191] STATEMENT: CREATE ROLE "repmgr" WITH LOGIN CREATEDB PASSWORD 'repmgrpassword';
ERROR: role "repmgr" already exists
ALTER ROLE
ALTER ROLE
postgresql-repmgr 15:09:55.05 INFO ==> Creating repmgr database: repmgr
ERROR: database "repmgr" already exists
2023-01-27 15:09:55.078 GMT [204] ERROR: database "repmgr" already exists
2023-01-27 15:09:55.078 GMT [204] STATEMENT: CREATE DATABASE repmgr;
postgresql-repmgr 15:09:55.08 INFO ==> Unregistering witness node...
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
ERROR: unable to connect to primary
DETAIL:
connection pointer is NULL
postgresql-repmgr 15:09:55.12 INFO ==> Registering witness node...
postgresql-repmgr 15:09:55.12 INFO ==> Waiting for primary node...
postgresql-repmgr 15:09:55.13 DEBUG ==> Wait for schema repmgr.repmgr on 'pg-0:5432', will try 6 times with 10 delay seconds (TIMEOUT=60)
postgresql-repmgr 15:09:55.16 DEBUG ==> Schema repmgr.repmgr exists!
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
ERROR: node "pgw-0" (ID: 1000) is already registered as a primary node
HINT: use "repmgr primary unregister" to remove a non-witness node record
postgresql-repmgr 15:09:55.24 INFO ==> Stopping PostgreSQL...
waiting for server to shut down....2023-01-27 15:09:55.255 GMT [176] LOG: received fast shutdown request
2023-01-27 15:09:55.259 GMT [176] LOG: aborting any active transactions
2023-01-27 15:09:55.262 GMT [176] LOG: background worker "logical replication launcher" (PID 184) exited with exit code 1
2023-01-27 15:09:55.262 GMT [178] LOG: shutting down
2023-01-27 15:09:55.313 GMT [178] LOG: checkpoint starting: shutdown immediate
2023-01-27 15:09:55.336 GMT [178] LOG: checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.005 s, sync=0.005 s, total=0.025 s; sync files=4, longest=0.003 s, average=0.002 s; distance=16384 kB, estimate=16384 kB
2023-01-27 15:09:55.345 GMT [176] LOG: database system is shut down
done
server stopped
@rafariossaa can you help?
@rafariossaa I don't get the witness node running. It always error out with the message that its already declared as primary.
version: '3.8'
services:
pg-0:
image: bitnami/postgresql-repmgr:latest
ports:
- 6430:5432
volumes:
- /docker/local/database_repmgr2/pg-0:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- POSTGRESQL_USERNAME=customuser
- POSTGRESQL_PASSWORD=custompassword
- POSTGRESQL_DATABASE=customdatabase
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
- REPMGR_NODE_NAME=pg-0
- REPMGR_NODE_NETWORK_NAME=pg-0
- REPMGR_PORT_NUMBER=5432
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
pg-1:
image: bitnami/postgresql-repmgr:latest
ports:
- 6431:5432
volumes:
- /docker/local/database_repmgr2/pg-1:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- POSTGRESQL_USERNAME=customuser
- POSTGRESQL_PASSWORD=custompassword
- POSTGRESQL_DATABASE=customdatabase
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
- REPMGR_NODE_NAME=pg-1
- REPMGR_NODE_NETWORK_NAME=pg-1
- REPMGR_PORT_NUMBER=5432
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
pgw-0:
image: bitnami/postgresql-repmgr:latest
ports:
- 6439:5432
volumes:
- /docker/local/database_repmgr2/pgw-0:/bitnami/postgresql
environment:
- POSTGRESQL_POSTGRES_PASSWORD=adminpassword
- POSTGRESQL_USERNAME=customuser
- POSTGRESQL_PASSWORD=custompassword
- POSTGRESQL_DATABASE=customdatabase
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_PRIMARY_PORT=5432
- REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
- REPMGR_NODE_NAME=pgw-0
- REPMGR_NODE_NETWORK_NAME=pgw-0
- REPMGR_PORT_NUMBER=5432
- REPMGR_NODE_TYPE=witness
- BITNAMI_DEBUG=true
deploy:
resources:
limits:
memory: 5G
replicas: 1
placement:
max_replicas_per_node: 1
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
No this issue is still not solved!
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
I am closing this in favor of https://github.com/bitnami/containers/issues/27124