repmgr icon indicating copy to clipboard operation
repmgr copied to clipboard

repmgr version 4.4 prooting standby to primary is not working properly

Open inetplus opened this issue 6 years ago • 15 comments

Dear RepMgr Team,

i installed repmgr version 4.4 and postgres version 11.4. i identified one issue while promoting standby to primary.

I am gettting always the following error message [ERROR] unable to promote server from standby to primary this error is getting from pg_ctl pg_ctl: PID file "/var/lib/pgsql/11/data/postmaster.pid" does not exist

The reason of this error message - as far as i understood - is that this postmaster.pid is available in the data folder not in configuration folder.

I searched a little bit and found that the code repmgr-action-node.c the line number 2060 says that the data directory is needed but getting the config file instead.

i hope it helps you to fix this issue and publish new release/patch for that?

inetplus avatar Aug 21 '19 15:08 inetplus

First we need to be able to reproduce the issue.

pg_ctl: PID file "/var/lib/pgsql/11/data/postmaster.pid" does not exist

The reason of this error message - as far as i understood - is that this postmaster.pid is available in the data folder not in configuration folder.

/var/lib/pgsql/11/data looks like the data directory, so for some unknown reason pg_ctl is not finding the PID file where it expects to.

To assist us with reproducing diagnosing this issue, please provide the following information:

  • how was repmgr installed (from source, or a package - and if so which?)
  • output of SELECT * FROM repmgr.nodes
  • contents of the repmgr.conf file on the standby to be promoted
  • output of the following query on the standby to be promoted: SELECT * FROM pg_settings WHERE name = 'data_directory'; (note: query requires superuser rights, or that the user is a member of the role pg_read_all_settings)
  • if possible, log output from the failed promotion attempt

ibarwick avatar Aug 22 '19 01:08 ibarwick

Hi,

thanks for quick response and please find below the answers for your queries

  • [ x] repmgr is installed from package

Installed Packages Name : repmgr11 Arch : x86_64 Version : 4.4.0 Release : 1.rhel7 Size : 1.0 M Repo : installed From repo : pgdg11 Summary : Replication Manager for PostgreSQL Clusters URL : https://www.repmgr.org

  • [ x] Output of select statement from primary:

SELECT * FROM repmgr.nodes ; node_id | upstream_node_id | active | node_name | type | location | priority | conninfo | repluser | slot_name | config_file ---------+------------------+--------+------------------+---------+----------+----------+-------------------------------------------------------------------+----------+-----------+---------------------------- 1 | | t | fra2intdpamdb001 | primary | default | 100 | host=fra2intdpamdb001 user=repmgr dbname=repmgr connect_timeout=2 | repmgr | | /etc/repmgr/11/repmgr.conf 2 | 1 | t | fra2intdpamdb002 | standby | default | 100 | host=fra2intdpamdb002 user=repmgr dbname=repmgr connect_timeout=2 | repmgr | | /etc/repmgr/11/repmgr.conf (2 rows)

  • Output of select statement from standby:

SELECT * FROM repmgr.nodes ; node_id | upstream_node_id | active | node_name | type | location | priority | conninfo | repluser | slot_name | config_file ---------+------------------+--------+------------------+---------+----------+----------+-------------------------------------------------------------------+----------+-----------+---------------------------- 1 | | t | fra2intdpamdb001 | primary | default | 100 | host=fra2intdpamdb001 user=repmgr dbname=repmgr connect_timeout=2 | repmgr | | /etc/repmgr/11/repmgr.conf 2 | 1 | t | fra2intdpamdb002 | standby | default | 100 | host=fra2intdpamdb002 user=repmgr dbname=repmgr connect_timeout=2 | repmgr | | /etc/repmgr/11/repmgr.conf (2 rows)

  • [x ] Contents of repmgr.conf file

node_id=2
node_name='fra2intdpamdb002'
conninfo='host=fra2intdpamdb002 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/mnt/mydbDirectory/pgsql/11/data' config_directory='/var/lib/pgsql/11/data' log_file='/var/log/repmgr/repmgr.log' pg_bindir='/usr/pgsql-11/bin/' ssh_options='-q -o ConnectTimeout=10' failover=automatic reconnect_attempts=6
reconnect_interval=10
promote_command='/usr/pgsql-11/bin/repmgr standby promote -f /etc/repmgr/11/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-11/bin/repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file'

  • [x ] output of the following query on the standby to be promoted: SELECT * FROM pg_settings WHERE name = 'data_directory';

    name      |          setting          | unit |    category    |            short_desc             | extra_desc |  context   | vartype |  source  | min_val | max_val | enumvals | boot_val |         reset_val
    

    | sourcefile | sourceline | pending_restart ----------------+---------------------------+------+----------------+-----------------------------------+------------+------------+---------+----------+---------+---------+----------+----------+---------------------- -----+------------+------------+----------------- data_directory | /mnt/mydbDirectory/pgsql/11/data | | File Locations | Sets the server's data directory. | | postmaster | string | override | | | | | /mnt/mydbDirectory/pgsql/11/ data | | | f (1 row)

  • [x ] log output

[2019-08-21 16:41:40] [NOTICE] this node is the only available candidate and will now promote itself [2019-08-21 16:41:40] [INFO] promote_command is: "/usr/pgsql-11/bin/repmgr standby promote -f /etc/repmgr/11/repmgr.conf --log-to-file" [2019-08-21 16:41:40] [NOTICE] redirecting logging output to "/var/log/repmgr/repmgr.log"

[2019-08-21 16:41:40] [NOTICE] promoting standby to primary [2019-08-21 16:41:40] [DETAIL] promoting server "fra2intdpamdb002" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -D /mnt/mydbDirectory/pgsql/11/data -w -D '/var/lib/pgsql/11/data' promote" pg_ctl: PID file "/var/lib/pgsql/11/data/postmaster.pid" does not exist Is server running? [2019-08-21 16:41:40] [ERROR] unable to promote server from standby to primary [2019-08-21 16:41:40] [INFO] searching for primary node [2019-08-21 16:41:40] [INFO] checking if node 1 is primary [2019-08-21 16:41:40] [ERROR] connection to database failed [2019-08-21 16:41:40] [DETAIL] could not connect to server: Connection refused Is the server running on host "fra2intdpamdb001" (100.65.x.xx) and accepting TCP/IP connections on port 5432?

inetplus avatar Aug 22 '19 06:08 inetplus

Please let me know when you need any other information or support from my side

thanks

inetplus avatar Aug 22 '19 06:08 inetplus

Thanks. Could you also attach the output of:

SELECT * FROM pg_file_settings WHERE name = 'data_directory'\gx

ibarwick avatar Aug 22 '19 07:08 ibarwick

here is the result:

postgres=# SELECT * FROM pg_file_settings WHERE name = 'data_directory'; -[ RECORD 1 ]-------------------------------------- sourcefile | /var/lib/pgsql/11/data/postgresql.conf sourceline | 41 seqno | 1 name | data_directory setting | /mnt/mydbDirectory/pgsql/11/data applied | t error |

inetplus avatar Aug 22 '19 07:08 inetplus

Thanks.

Next one: please provide the output of the following commands:

pg_ctl -D /var/lib/pgsql/11/data/ status pg_ctl -D /mnt/mydbDirectory/pgsql/11/data

ibarwick avatar Aug 22 '19 07:08 ibarwick

This is odd - you have two -D parameters in the pg_ctl command:

[2019-08-21 16:41:40] [DETAIL] promoting server "fra2intdpamdb002" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -D /mnt/mydbDirectory/pgsql/11/data -w -D '/var/lib/pgsql/11/data' promote"

Please attach the repmgr.conf file for this too.

FYI the following command should work:

/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote

Note that pg_ctl's -D option, somewhat counterintuitively, should point to the directory where the configuration files are, which is not necessarily the data directory; see:

https://www.postgresql.org/docs/current/app-pg-ctl.html

ibarwick avatar Aug 22 '19 08:08 ibarwick

/usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/11/data/ status pg_ctl: no server running

/usr/pgsql-11/bin/pg_ctl -D /mnt/mydbDirectory/pgsql/11/data/ status pg_ctl: server is running (PID: 14425)

inetplus avatar Aug 22 '19 08:08 inetplus

I see that i copied the latest test log [2019-08-21 16:41:40] [DETAIL] promoting server "fra2intdpamdb002" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -D /mnt/mydbDirectory/pgsql/11/data -w -D '/var/lib/pgsql/11/data' promote"

you see the parameter -D two times. The reason is that i had updated for testing purposes the attribute pg_ctl_options in repmgr.conf file. it did not work and i removed this value again.

Please find below content from log file without this changes [2019-08-21 13:46:03] [NOTICE] redirecting logging output to "/var/log/repmgr/repmgr.log"

[2019-08-21 13:46:03] [NOTICE] promoting standby to primary [2019-08-21 13:46:03] [DETAIL] promoting server "fra2intdpamdb002" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote" pg_ctl: PID file "/var/lib/pgsql/11/data/postmaster.pid" does not exist Is server running? [2019-08-21 13:46:03] [ERROR] unable to promote server from standby to primary [2019-08-21 13:46:03] [INFO] searching for primary node [2019-08-21 13:46:03] [INFO] checking if node 1 is primary [2019-08-21 13:46:03] [ERROR] connection to database failed

inetplus avatar Aug 22 '19 08:08 inetplus

when i execute the command

/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote

/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote pg_ctl: PID file "/var/lib/pgsql/11/data/postmaster.pid" does not exist Is server running?

inetplus avatar Aug 22 '19 08:08 inetplus

just one more information.

the file postmaster.pid exists in the data folder /mnt/mydbDirectory/pgsql/11/data

i hope it helps a little bit more.

inetplus avatar Aug 22 '19 09:08 inetplus

do u have any feedback for me how to continue? thanks

inetplus avatar Aug 23 '19 11:08 inetplus

Hi i have read all information from this link that you provide me: https://www.postgresql.org/docs/current/app-pg-ctl.html

it is written that the parameter -D points to data directory. PGDATA in the environment variable shows to Default data directory location. AND under the section Files it is written that the file postmaster.pid pg_ctl examines this file in the data directory to determine whether the server is currently running.

inetplus avatar Aug 24 '19 19:08 inetplus

I haven't been able to reproduce this issue yet.

The only thing I can think of right now is that as you are using RHEL7 (or a variant thereof), and have installed repmgr from packages, and appear to have put the configuration files where the data directory normally is, then put the data directory somewhere else, it's conceivable systemd is causing things to get messed up somehow (unless you've taken steps to disable the systemd side of things).

A cleaner approach would probably just be to create a symlink from the standard data directory location to the new location, that way you can continue to use the systemd functionality without needing to modify things. In particular we recommend configuring repmgr to use systemd's service commands, rather than falling back to pg_ctl; for details see the documentation here: https://repmgr.org/docs/current/configuration-file-service-commands.html

ibarwick avatar Aug 26 '19 05:08 ibarwick

I am not sure whether i understood it properly because of that i would like to summary this issue from my point of view.

PostgresSQL 11 is installed in CENTOD Version 7. The data and configuraton files are in /var/lib/pgsql/11/data i changed the data folder to /mnt/mydbDirectory/pgsql/11/data the configuration file is still in the folder /var/lib/pgsql/11/data.

the psql shows also the right config file and right data directory while executing the command SHOW data_directory etc.

I see your point that the proposed solution from your side could solve this issue (need to be tested from my end)

However after i read the documentation provided by Postgres and checked the implementation by repmgr - i am convinced that it makes more sense to double check the code in repmgr.

However i am really approciate to get support from you.

When you need any other support from my side please let me know. otherwise i will double check whether it will work as you described. otherwise i have no choice to change the solution. i let you know how is went.

thanks for your support & Cheers

inetplus avatar Aug 26 '19 07:08 inetplus