repmgr repmgrd 4.2 heavily swapping out (pgsql 11.2)

Hi,

I've noticed that one of my servers in a three-instance postgres cluster is swapping out half of available swap due to repmgrd. RHEL7.6, postgres 11.2 and repmgrd 4.2. Tail from smem -k:

...

 8157 postgres /usr/pgsql-11/bin/postmaste    56.0M   348.0K    60.2M   294.6M
20676 postgres postgres: autovacuum worker    55.9M    11.6M    61.4M   260.3M
19116 postgres postgres: autovacuum worker    55.9M    11.7M    69.5M   288.2M
11245 postgres /usr/pgsql-11/bin/repmgrd -    13.7G    97.8M    97.9M    98.7M
 7866 postgres postgres: autovacuum worker    55.9M    11.6M   108.0M   395.1M
 5217 pcp      /usr/libexec/pcp/bin/pmwebd    11.9M   169.3M   169.5M   171.4M
 8171 postgres postgres: background writer    56.2M   592.0K     7.4G    15.2G
 8170 postgres postgres: checkpointer         57.8M    58.9M     7.4G    15.2G

# free -m
              total        used        free      shared  buff/cache   available
Mem:          64262        2273         388       15742       61600       45650
Swap:         32766       18324       14442

Has anyone come across this issue? I'm using a default configuration like described in the documentation.

May 08 '19 13:05 azet

Can you provide a) repmgr.conf and b) the repmgrd log file (or relevant-looking excerpts from it)?

May 08 '19 14:05 ibarwick

repmgr.conf:

###################################################
# repmgr sample configuration file
###################################################

# Some configuration items will be set with a default value; this
# is noted for each item. Where no default value is shown, the
# parameter will be treated as empty or false.

# =============================================================================
# Required configuration items
# =============================================================================
#
# repmgr and repmgrd require the following items to be explicitly configured.


node_id=1                        # A unique integer greater than zero
node_name='hostname01'                        # An arbitrary (but unique) string; we recommend
                                 # using the server's hostname or another identifier
                                 # unambiguously associated with the server to avoid
                                 # confusion. Avoid choosing names which reflect the
                                 # node's current role, e.g. "primary" or "standby1",
                                 # as roles can change and it will be confusing if
                                 # the current primary is called "standby1".

conninfo='host=xxx port=5432 user=xxx password=xxx'                      # Database connection information as a conninfo string.
                                 # All servers in the cluster must be able to connect to
                                 # the local node using this string.
                                 #
                                 # For details on conninfo strings, see:
                                 #  https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING
                                 #
                                 # If repmgrd is in use, consider explicitly setting
                                 # "connect_timeout" in the conninfo string to determine
                                 # the length of time which elapses before a network
                                 # connection attempt is abandoned; for details see:
                                 #  https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT

data_directory='/var/lib/pgsql/11/data'          # The node's data directory. This is needed by repmgr
                                 # when performing operations when the PostgreSQL instance
                                 # is not running and there's no other way of determining
                                 # the data directory.


failover=automatic
promote_command='/usr/pgsql-11/bin/repmgr standby promote -f /etc/repmgr/11/repmgr.conf --log-to-file'
follow_command=''/usr/pgsql-11/bin/repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes


# =============================================================================

# Optional configuration items
# =============================================================================


#------------------------------------------------------------------------------
# Server settings
#------------------------------------------------------------------------------

#config_directory=''             # If configuration files are located outside the data
                                 # directory, specify the directory where the main
                                 # postgresql.conf file is located.

#------------------------------------------------------------------------------
# Replication settings
#------------------------------------------------------------------------------

#replication_user='repmgr'       # User to make replication connections with, if not set defaults
                                 # to the user defined in "conninfo".

replication_type=physical        # Must be one of 'physical' or 'bdr'.

#location=default                # arbitrary string defining the location of the node; this
                                 # is used during failover to check visibilty of the
                                 # current primary node. See the 'repmgrd' documentation
                                 # in README.md for further details.

#use_replication_slots=no        # whether to use physical replication slots
                                 # NOTE: when using replication slots,
                                 # 'max_replication_slots' should be configured for
                                 # at least the number of standbys which will connect
                                 # to the primary.

#------------------------------------------------------------------------------
# Witness server settings
#------------------------------------------------------------------------------

#witness_sync_interval=15        # interval (in seconds) to synchronise node records
                                 # to the witness server

#------------------------------------------------------------------------------
# Logging settings
#------------------------------------------------------------------------------
#
# Note that logging facility settings will only apply to `repmgrd` by default;
# `repmgr` will always write to STDERR unless the switch `--log-to-file` is
# supplied, in which case it will log to the same destination as `repmgrd`.
# This is mainly intended for those cases when `repmgr` is executed directly
# by `repmgrd`.

#log_level=INFO                  # Log level: possible values are DEBUG, INFO, NOTICE,
                                 # WARNING, ERROR, ALERT, CRIT or EMERG

#log_facility=STDERR             # Logging facility: possible values are STDERR, or for
                                 # syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER

#log_file=''                     # STDERR can be redirected to an arbitrary file
#log_status_interval=300         # interval (in seconds) for repmgrd to log a status message


#------------------------------------------------------------------------------
# Event notification settings
#------------------------------------------------------------------------------

# event notifications can be passed to an arbitrary external program
# together with the following parameters:
#
#   %n - node ID
#   %e - event type
#   %s - success (1 or 0)
#   %t - timestamp
#   %d - details
#
# the values provided for "%t" and "%d" will probably contain spaces,
# so should be quoted in the provided command configuration, e.g.:
#
#   event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
#
# By default, all notifications will be passed; the notification types
# can be filtered to explicitly named ones, e.g.:
#
#   event_notifications=primary_register,standby_register

#event_notification_command=''          # An external program or script which
                                        # can be executed by the user under which
                                        # repmgr/repmgrd are run.

#event_notifications=''                 # A commas-separated list of notification
                                        # types

#------------------------------------------------------------------------------
# Environment/command settings
#------------------------------------------------------------------------------

pg_bindir='/usr/pgsql-11/bin/'                          # Path to PostgreSQL binary directory (location
                                        # of pg_ctl, pg_basebackup etc.). Only needed
                                        # if these files are not in the system $PATH.
                                        #
                                        # Debian/Ubuntu users: you will probably need to
                                        # set this to the directory where `pg_ctl` is located,
                                        # e.g. /usr/lib/postgresql/9.6/bin/
                                        #
                                        # *NOTE* "pg_bindir" is only used when repmgr directly
                                        # executes PostgreSQL binaries; any user-defined scripts
                                        # *must* be specified with the full path

#repmgr_bindir=''                       # Path to repmgr binary directory (location of the repmgr
                                        # binary. Only needed if the repmgr executable is not in
                                        # the system $PATH or the path defined in "pg_bindir".

#use_primary_conninfo_password=false    # explicitly set "password" in recovery.conf's
                                        # "primary_conninfo" parameter using the value contained
                                        # in the environment variable PGPASSWORD
#passfile=''                            # path to .pgpass file to include in "primary_conninfo"
#------------------------------------------------------------------------------
# external command options
#------------------------------------------------------------------------------
#
# Options which can be passed to external commands invoked by repmgr/repmgrd.
#
# Examples:
#
#   pg_ctl_options='-s'
#   pg_basebackup_options='--label=repmgr_backup'
#   rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""
#   ssh_options=-o "StrictHostKeyChecking no"

#pg_ctl_options=''                      # Options to append to "pg_ctl"
#pg_basebackup_options=''               # Options to append to "pg_basebackup"
#rsync_options=''                       # Options to append to "rsync"
ssh_options='-q -o ConnectTimeout=10'   # Options to append to "ssh"



#------------------------------------------------------------------------------
# "standby clone" settings
#------------------------------------------------------------------------------
#
# These settings apply when cloning a standby ("repmgr standby clone").
#
# Examples:
#
#   tablespace_mapping=/path/to/original/tablespace=/path/to/new/tablespace
#   restore_command = 'cp /path/to/archived/wals/%f %p'

#tablespace_mapping=''                  # Tablespaces can be remapped from one
                                        # file system location to another. This
                                        # parameter can be provided multiple times.

#restore_command=''                     # This will be placed in the recovery.conf file generated
                                        # by repmgr.

#archive_cleanup_command=''             # This will be placed in the recovery.conf file generated
                                        # by repmgr. Note we recommend using Barman for managing
                                        # WAL archives (see: https://www.pgbarman.org )

#recovery_min_apply_delay=              # If provided, "recovery_min_apply_delay" in recovery.conf
                                        # will be set to this value (PostgreSQL 9.4 and later).


#------------------------------------------------------------------------------
# "standby promote" settings
#------------------------------------------------------------------------------

# These settings apply when instructing a standby to promote itself to the
# new primary ("repmgr standby promote").

#promote_check_timeout=60               # The length of time (in seconds) to wait
                                        # for the new primary to finish promoting
#promote_check_interval=1               # The interval (in seconds) to check whether
                                        # the new primary has finished promoting


#------------------------------------------------------------------------------
# "standby follow" settings
#------------------------------------------------------------------------------

# These settings apply when instructing a standby to follow the new primary
# ("repmgr standby follow").

#primary_follow_timeout=60              # The max length of time (in seconds) to wait
                                        # for the new primary to become available
#standby_follow_timeout=15              # The max length of time (in seconds) to wait
                                        # for the standby to connect to the primary


#------------------------------------------------------------------------------
# "standby switchover" settings
#------------------------------------------------------------------------------

# These settings apply when switching roles between a primary and a standby
# ("repmgr standby switchover").

#shutdown_check_timeout=60              # The max length of time (in seconds) to wait for the demotion
                                        # candidate (current primary) to shut down
#standby_reconnect_timeout=60           # The max length of time (in seconds) to wait
                                        # for the demoted standby to reconnect to the promoted
                                        # primary (note: this value should be equal to or greater
                                        # than that set for "node_rejoin_timeout")

#------------------------------------------------------------------------------
# "node rejoin" settings
#------------------------------------------------------------------------------

# These settings apply when reintegrating a node into a replication cluster
# with "repmgrd_node_rejoin"

#node_rejoin_timeout=60         # The maximum length of time (in seconds) to wait for
                                        # the node to reconnect to the replication cluster

#------------------------------------------------------------------------------
# Barman options
#------------------------------------------------------------------------------

#barman_server=''                       # The barman configuration section
#barman_host=''                         # The host name of the barman server
#barman_config=''                       # The Barman configuration file on the
                                        # Barman server (needed if the file is
                                        # in a non-standard location)

#------------------------------------------------------------------------------
# Failover and monitoring settings (repmgrd)
#------------------------------------------------------------------------------
#
# These settings are only applied when repmgrd is running. Values shown
# are defaults.

#repmgrd_pid_file=                      # Path of PID file to use for repmgrd; if not set, a PID file will
                                        # be generated in a temporary directory specified by the environment
                                        # variable $TMPDIR, or if not set, in "/tmp". This value can be overridden
                                        # by the command line option "-p/--pid-file"; the command line option
                                        # "--no-pid-file" will force PID file creation to be skipped.
#failover=manual                        # one of 'automatic', 'manual'.
                                        # determines what action to take in the event of upstream failure
                                        #
                                        # 'automatic': repmgrd will automatically attempt to promote the
                                        #    node or follow the new upstream node
                                        # 'manual': repmgrd will take no action and the node will require
                                        #    manual attention to reattach it to replication
                                        # (does not apply to BDR mode)

#priority=100                           # indicate a preferred priority for promoting nodes;
                                        # a value of zero prevents the node being promoted to primary
                                        # (default: 100)

#reconnect_attempts=6                   # Number of attempts which will be made to reconnect to an unreachable
                                        # primary (or other upstream node)
#reconnect_interval=10                  # Interval between attempts to reconnect to an unreachable
                                        # primary (or other upstream node)
#promote_command=                       # command repmgrd executes when promoting a new primary; use something like:
                                        #
                                        #     repmgr standby promote -f /etc/repmgr.conf
                                        #
#follow_command=                        # command repmgrd executes when instructing a standby to follow a new primary;
                                        # use something like:
                                        #
                                        #     repmgr standby follow -f /etc/repmgr.conf -W --upstream-node-id=%n
                                        #
#primary_notification_timeout=60        # Interval (in seconds) which repmgrd on a standby
                                        # will wait for a notification from the new primary,
                                        # before falling back to degraded monitoring
#repmgrd_standby_startup_timeout=60     # Interval (in seconds) which repmgrd on a standby will wait
                                        # for the the local node to restart and become ready to accept connections after
                                        # executing "follow_command" (defaults to the value set in "standby_reconnect_timeout")

#monitoring_history=no                  # Whether to write monitoring data to the "montoring_history" table
#monitor_interval_secs=2                # Interval (in seconds) at which to write monitoring data
#degraded_monitoring_timeout=-1         # Interval (in seconds) after which repmgrd will terminate if the
                                        # server being monitored is no longer available. -1 (default)
                                        # disables the timeout completely.
#async_query_timeout=60                 # Interval (in seconds) which repmgrd will wait before
                                        # cancelling an asynchronous query.

#------------------------------------------------------------------------------
# service control commands
#------------------------------------------------------------------------------
#
# repmgr provides options to override the default pg_ctl commands
# used to stop, start, restart, reload and promote the PostgreSQL cluster
#
# NOTE: These commands must be runnable on remote nodes as well for switchover
# to function correctly.
#
# If you use sudo, the user repmgr runs as (usually 'postgres')  must have
# passwordless sudo access to execute the command.
#
# For example, to use systemd, you can set
#
#    service_start_command = 'sudo systemctl start postgresql-9.6'
#    (...)
#
# and then use the following sudoers configuration:
#
#    # this is required when running sudo over ssh without -t:
#    Defaults:postgres !requiretty
#    postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \
#       /usr/bin/systemctl start postgresql-9.6, \
#       /usr/bin/systemctl restart postgresql-9.6
#
# Debian/Ubuntu users: use "sudo pg_ctlcluster" to execute service control commands.
#
# For more details, see: https://repmgr.org/docs/4.1/configuration-service-commands.html

#service_start_command = ''
#service_stop_command = ''
#service_restart_command = ''
#service_reload_command = ''
#service_promote_command = ''           # This parameter is intended for systems which provide a
                                        # package-level promote command, such as Debian's
                                        # "pg_ctlcluster". *IMPORTANT*: it is *not* a substitute
                                        # for "promote_command"; do not use "repmgr standby promote"
                                        # (or a script which executes "repmgr standby promote") here.

#------------------------------------------------------------------------------
# Status check thresholds
#------------------------------------------------------------------------------

# Various warning/critical thresholds used by "repmgr node check".

#archive_ready_warning=16               # repmgr node check --archive-ready
#archive_ready_critical=128             #
                                        # Numbers of files pending archiving via PostgreSQL's
                                        # "archive_command" configuration parameter. If
                                        # files can't be archived fast enough, or the archive
                                        # command is failing, the buildup of files can
                                        # cause various issues, such as server shutdown being
                                        # delayed until all files are archived, or excessive
                                        # space being occupied by unarchived files.
                                        #
                                        # Note that these values will be checked when executing
                                        # "repmgr standby switchover" to warn about potential
                                        # issues with shutting down the demotion candidate.

#replication_lag_warning=300            # repmgr node check --replication-lag
#replication_lag_critical=600           #
                                        # Note that these values will be checked when executing
                                        # "repmgr standby switchover" to warn about potential
                                        # issues with shutting down the demotion candidate.


#------------------------------------------------------------------------------
# BDR monitoring options
#------------------------------------------------------------------------------

#bdr_local_monitoring_only=false         # Only monitor the local node; no checks will be
                                         # performed on the other node
#bdr_recovery_timeout                    # If a BDR node was offline and has become available
                                         # maximum length of time in seconds to wait for the
                                         # node to reconnect to the cluster

can't find any relevant log entires except for repmgrd starting etc. in postgres: connecting, disconnecting,..

last two entries:

May  8 16:10:20 localhost repmgrd: [2019-05-08 16:10:20] [INFO] monitoring primary node "hostname01" (node ID: 1) in normal state
May  8 16:15:21 localhost repmgrd: [2019-05-08 16:15:21] [INFO] monitoring primary node "hostname01" (node ID: 1) in normal state

May 08 '19 14:05 azet

Hi

Was either the local or upstream PostgreSQL node inactive/unreachable for a longer period? There is a memory leak in 4.2 which occurs when one of the monitored nodes is not available, which could explain what you are seeing.

BTW here: follow_command=''/usr/pgsql-11/bin/repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file --upstream-node-id=%n'

you have an extra ', this should be: follow_command='/usr/pgsql-11/bin/repmgr standby follow -f /etc/repmgr/11/repmgr.conf --log-to-file --upstream-node-id=%n'

May 09 '19 00:05 ibarwick

Not really, the nodes were behind replicating but part of the cluster according to cluster show everything was fine.

thx for pointing out the typo!

May 09 '19 11:05 azet

Not really, the nodes were behind replicating but part of the cluster according to cluster show everything was fine.

Hmm, could you provide the repmgrd logs anyway? Preferably for all nodes in the cluster; they may have some clues. If not appropriate for attaching here, please send to ian [at] 2ndquadrant.com.

May 10 '19 07:05 ibarwick

well we have repmgr(d) logging to the systemd journal, but any and all messages related to the incident (previous was cluster creation and messages that nodes are monitored) are like above, I couldn't find anything suspicious myself during that episode, otherwise I would have put it in the initial post. It's a bit hard to give you the entire journal and these are company production machines.

May 10 '19 10:05 azet

repmgr repmgr copied to clipboard

repmgrd 4.2 heavily swapping out (pgsql 11.2)

repmgr
repmgr copied to clipboard