containers icon indicating copy to clipboard operation
containers copied to clipboard

When the master node starts again, postgresql cannot start normally

Open yaoyao6 opened this issue 4 years ago • 6 comments

Description Master db and slave be deployed on different servers. Both master and slave synchronize data normally. When the primary node shuts down, the standby node switches normally. But when the master node starts again, PostgreSQL cannot start normally

Steps to reproduce the issue:

  1. Master db [A] and slave [B] be deployed on different servers. Both master and slave synchronize data normally
  2. A shuts down, B switches to master db
  3. A starts again, postgresql cannot start

Describe the results you received: log

pam-pgsql-0_1             | postgresql-repmgr 08:43:50.93 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
pam-pgsql-0_1             | postgresql-repmgr 08:43:50.96 INFO  ==> Validating settings in REPMGR_* env vars...
pam-pgsql-0_1             | postgresql-repmgr 08:43:50.97 INFO  ==> Validating settings in POSTGRESQL_* env vars..
pam-pgsql-0_1             | postgresql-repmgr 08:43:50.98 INFO  ==> Querying all partner nodes for common upstream node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.05 INFO  ==> Auto-detected primary node: '10.47.154.107:5432'
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.06 INFO  ==> Preparing PostgreSQL configuration...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.07 INFO  ==> postgresql.conf file not detected. Generating it...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.13 INFO  ==> Preparing repmgr configuration...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.13 INFO  ==> Initializing Repmgr...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.14 INFO  ==> Waiting for primary node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.17 INFO  ==> Cloning data from primary node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.88 INFO  ==> Initializing PostgreSQL database...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.89 INFO  ==> Cleaning stale /bitnami/postgresql/data/standby.signal file
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.90 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.90 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.92 INFO  ==> Deploying PostgreSQL with persisted data...
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.94 INFO  ==> Configuring replication parameters
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.96 INFO  ==> Configuring fsync
pam-pgsql-0_1             | postgresql-repmgr 08:43:51.99 INFO  ==> Setting up streaming replication slave...
pam-pgsql-0_1             | postgresql-repmgr 08:43:52.02 INFO  ==> Starting PostgreSQL in background...
pam-pgsql-0_1             | postgresql-repmgr 08:43:52.16 INFO  ==> Unregistering standby node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:52.28 INFO  ==> Registering Standby node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:52.33 INFO  ==> Stopping PostgreSQL...
pam-pgsql-0_1             | postgresql-repmgr 08:43:53.35 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **
pam-pgsql-0_1             | 
pam-pgsql-0_1             | postgresql-repmgr 08:43:53.38 INFO  ==> Starting PostgreSQL in background...
pam-pgsql-0_1             | postgresql-repmgr 08:43:53.52 INFO  ==> ** Starting repmgrd **
pam-pgsql-0_1             | [2020-10-17 08:43:53] [NOTICE] repmgrd (repmgrd 5.1.0) starting up
pam-pgsql-0_1             | [2020-10-17 08:43:53] [ERROR] PID file "/opt/bitnami/repmgr/tmp/repmgr.pid" exists and seems to contain a valid PID
pam-pgsql-0_1             | [2020-10-17 08:43:53] [HINT] if repmgrd is no longer alive, remove the file and restart repmgrd
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.70 
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.70 Welcome to the Bitnami postgresql-repmgr container
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.70 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.70 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.71 
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.72 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.75 INFO  ==> Validating settings in REPMGR_* env vars...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.76 INFO  ==> Validating settings in POSTGRESQL_* env vars..
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.76 INFO  ==> Querying all partner nodes for common upstream node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.83 INFO  ==> Auto-detected primary node: '10.47.154.107:5432'
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.84 INFO  ==> Preparing PostgreSQL configuration...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.84 INFO  ==> postgresql.conf file not detected. Generating it...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.91 INFO  ==> Preparing repmgr configuration...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.92 INFO  ==> Initializing Repmgr...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.92 INFO  ==> Waiting for primary node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:55.96 INFO  ==> Cloning data from primary node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.68 INFO  ==> Initializing PostgreSQL database...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.68 INFO  ==> Cleaning stale /bitnami/postgresql/data/standby.signal file
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.69 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.69 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.71 INFO  ==> Deploying PostgreSQL with persisted data...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.73 INFO  ==> Configuring replication parameters
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.75 INFO  ==> Configuring fsync
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.77 INFO  ==> Setting up streaming replication slave...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.80 INFO  ==> Starting PostgreSQL in background...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.92 INFO  ==> Unregistering standby node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:56.98 INFO  ==> Registering Standby node...
pam-pgsql-0_1             | postgresql-repmgr 08:43:57.03 INFO  ==> Stopping PostgreSQL...
pam-pgsql-0_1             | postgresql-repmgr 08:43:58.05 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **
pam-pgsql-0_1             | 
pam-pgsql-0_1             | postgresql-repmgr 08:43:58.07 INFO  ==> Starting PostgreSQL in background...
pam-pgsql-0_1             | postgresql-repmgr 08:43:58.20 INFO  ==> ** Starting repmgrd **
pam-pgsql-0_1             | [2020-10-17 08:43:58] [NOTICE] repmgrd (repmgrd 5.1.0) starting up
pam-pgsql-0_1             | [2020-10-17 08:43:58] [ERROR] PID file "/opt/bitnami/repmgr/tmp/repmgr.pid" exists and seems to contain a valid PID
pam-pgsql-0_1             | [2020-10-17 08:43:58] [HINT] if repmgrd is no longer alive, remove the file and restart repmgrd
pam-pgsql-0_1             | postgresql-repmgr 08:44:01.82 
pam-pgsql-0_1             | postgresql-repmgr 08:44:01.82 Welcome to the Bitnami postgresql-repmgr container

Describe the results you expected: A starts again, postgresql starts and reconnects as a standby node.

Additional information you deem important (e.g. issue happens only occasionally): when I remove /opt/bitnami/repmgr/tmp/repmgr.pid, everything is back to normal.

yaoyao6 avatar Oct 23 '20 08:10 yaoyao6

Hi,

This is weird

pam-pgsql-0_1 | [2020-10-17 08:43:53] [ERROR] PID file "/opt/bitnami/repmgr/tmp/repmgr.pid" exists and seems to contain a valid PID

We've been dealing with race condition issues in the past but this one does not seem to be related. Pinging @rafariossaa as he has been dealing with startup issues in the past.

javsalgar avatar Oct 23 '20 08:10 javsalgar

change - REPMGR_PARTNER_NODES=10.47.154.106,10.47.154.107 to - REPMGR_PARTNER_NODES=10.47.154.106,10.47.154.107:5432 solved my problem.

yaoyao6 avatar Feb 03 '21 04:02 yaoyao6

- REPMGR_PARTNER_NODES=10.47.154.106,10.47.154.107:5432 solved my problem.

Good to hear, but this cannot be a solution. The example docker-compose.yml is misleading. You are free to specify a port, but it is not mandatory to add the port to the last host entry: - REPMGR_PARTNER_NODES=pg-0,pg-1:5432

The entries here are evaluated in librepmgr.sh and if no port was specified for a host, the default port is added for that host like in librepmgr.sh: port="${port:-$REPMGR_PRIMARY_PORT}"

A solution is still required to handle old repmgr PID files.

Since repmgrd 4.1 the parameter --pid-file is not required anymore: repmgrd-daemon.html. But this image here still uses it in run.sh: readonly repmgr_flags=("--pid-file=$REPMGR_PID_FILE" "-f" "$REPMGR_CONF_FILE" "--daemonize=false")

A startup of repmgr with an existing PID file of a docker container specified as --pid-file will force repmgr just to exit 3. It does not delete the PID file. repmgrd.c

Approaches can be:

  • check for an old PID file and remove it. This is already implemented for the psql pid file.
  • remove the --pid-file parameter and let repmgr try to manage the pid file on its own. An alternative can be to use --no-pid-file in this docker environment?

But why does repmgr not remove the PID file in the first place, if the system is rebooted?

https://github.com/EnterpriseDB/repmgr/issues/517#issuecomment-459602213

reduakt avatar Jun 23 '22 12:06 reduakt

Hi @reduakt ,

Thanks for your feedback.

I have created a task so the team can review it. Unfortunately, I cannot provide you with an ETA.

dgomezleon avatar Jun 29 '22 15:06 dgomezleon

Thanks for reporting this issue. Would you like to contribute by creating a PR to solve the issue? The Bitnami team will be happy to review it and provide feedback. Here you can find the contributing guidelines.

carrodher avatar Jun 29 '22 16:06 carrodher

We are going to transfer this issue to bitnami/containers

In order to unify the approaches followed in Bitnami containers and Bitnami charts, we are moving some issues in bitnami/bitnami-docker-<container> repositories to bitnami/containers.

Please follow bitnami/containers to keep you updated about the latest bitnami images.

More information here: https://blog.bitnami.com/2022/07/new-source-of-truth-bitnami-containers.html

carrodher avatar Jul 28 '22 13:07 carrodher

Hi, A new release removing the PID file has just being made. Could you give it a try ?

rafariossaa avatar Sep 16 '22 15:09 rafariossaa

I am closing this issue. Please, if you find further errors don't hesitate to reopen it.

rafariossaa avatar Sep 21 '22 08:09 rafariossaa