patroni
patroni copied to clipboard
Generated pg_basebackup command should be logged
What happened?
When Patroni initializes a node, the pg_basebackup command it generates is not logged, and it's challenging to try to reproduce from the configuration. I ran into an issue recently where there was a lingering hidden file in a custom WAL directory when I was trying to reinitialize, and it took me hours to figure out why. This is not the first time I've run into this - it's tripped me up at least a half dozen times over the past few years. It should not be this hard to determine what is going wrong, and logging the generated command would allow me to quickly copy/paste and try to run it in the foreground. Perhaps including logging of the pg_basebackup error output would be in order as well, at least when log level is set to DEBUG.
How can we reproduce it (as minimally and precisely as possible)?
Reinitialize a node and watch the Patroni log, at either INFO or DEBUG log level.
What did you expect to happen?
Something more than this:
ERROR: Error when fetching backup: pg_basebackup exited with code=1 ERROR: failed to bootstrap from leader '...'
Patroni/PostgreSQL/DCS version
- Patroni version: 3.2.1
- PostgreSQL version: 14.10
- DCS (and its version): Not relevant
Patroni configuration file
---
scope: 14-truth
namespace: /db/
name: prime-br2-pg-truth-0
bootstrap:
dcs:
loop_wait: 3
maximum_lag_on_failover: 1048576
retry_timeout: 10
ttl: 30
postgresql:
parameters:
archive_command: envdir /etc/wal-g/14-truth /usr/local/bin/wal-g wal-push "%p"
archive_mode: true
archive_timeout: 5min
autovacuum_vacuum_cost_delay: 10ms
autovacuum_vacuum_scale_factor: 0.2
autovacuum_vacuum_threshold: 10000
checkpoint_completion_target: 0.9
checkpoint_timeout: 30min
cluster_name: truth
default_statistics_target: 100
effective_cache_size: 252GB
effective_io_concurrency: 200
hot_standby_feedback: true
huge_pages: try
idle_in_transaction_session_timeout: 15min
log_autovacuum_min_duration: 0
log_checkpoints: true
log_connections: true
log_destination: stderr
log_directory: /var/log/postgresql/14/truth
log_disconnections: true
log_duration: false
log_error_verbosity: DEFAULT
log_file_mode: '0640'
log_filename: '%Y-%m-%d_%H.log'
log_line_prefix: '%m [%p]: %qdb=%d,user=%u,app=%a,client=%h %v '
log_lock_waits: true
log_min_duration_statement: 250ms
log_min_messages: WARNING
log_recovery_conflict_waits: true
log_replication_commands: false
log_rotation_age: 1h
log_rotation_size: 0
log_statement: none
log_temp_files: 0
log_timezone: Etc/UTC
logging_collector: true
maintenance_work_mem: 2GB
max_connections: 2500
max_logical_replication_workers: 24
max_parallel_maintenance_workers: 4
max_parallel_workers: 64
max_parallel_workers_per_gather: 4
max_replication_slots: 64
max_stack_depth: 7MB
max_sync_workers_per_subscription: 24
max_wal_senders: 16
max_wal_size: 120GB
min_wal_size: 70GB
max_worker_processes: 64
pg_stat_statements.track: all
random_page_cost: 1.1
restore_command: envdir /etc/wal-g/14-truth /usr/local/bin/wal-g wal-fetch "%f" "%p"
shared_buffers: 126GB
shared_preload_libraries: pg_stat_statements,pg_cron
stats_temp_directory: /run/postgresql/14-truth.pg_stat_tmp
superuser_reserved_connections: 30
temp_file_limit: 4GB
timezone: Etc/UTC
track_activities: true
track_activity_query_size: 4096
track_functions: all
track_io_timing: true
unix_socket_directories: /run/postgresql
wal_buffers: 64MB
wal_keep_size: 60GB
wal_level: logical
work_mem: 64MB
pg_hba:
REDACTED
pg_ident:
REDACTED
initdb:
- allow-group-access
- data-checksums
- encoding: UTF8
- locale: en_US.UTF8
- waldir: /var/lib/postgresql/wal/14/truth
users:
REDACTED
etcd3:
hosts:
- prime-br2-etcd-0:2379
- prime-br2-etcd-1:2379
- prime-br2-etcd-2:2379
log:
level: INFO
dir: /var/log/patroni/14-truth
file_num: 4
file_size: 16777216
postgresql:
authentication:
replication:
username: replication
password: 'REDACTED'
rewind:
username: postgres
password: 'REDACTED'
superuser:
username: postgres
password: 'REDACTED'
basebackup:
- checkpoint: fast
- waldir: /var/lib/postgresql/wal/14/truth
bin_dir: /usr/lib/postgresql/14/bin
connect_address: prime-br2-pg-truth-0:5432
create_replica_methods:
- basebackup
data_dir: /var/lib/postgresql/14/truth
listen: '*:5432'
pgpass: /var/lib/postgresql/.pgpass.14-truth
use_pg_rewind: true
use_unix_socket: true
restapi:
listen: '*:8008'
connect_address: prime-br2-pg-truth-0:8008
patronictl show-config
Not relevant
Patroni log files
ERROR: Error when fetching backup: pg_basebackup exited with code=1
ERROR: failed to bootstrap from leader '...'
PostgreSQL log files
N/A, it never gets that far.
Have you tried to use GitHub issue search?
- [X] Yes
Anything else we need to know?
No response
When Patroni initializes a node, the pg_basebackup command it generates is not logged, and it's challenging to try to reproduce from the configuration
Well, just logging a command to be executed isn't a problem, but it will not be a simple copy&paste for you, mainly because the password is never supplied directly.
I ran into an issue recently where there was a lingering hidden file in a custom WAL directory when I was trying to reinitialize, and it took me hours to figure out why. This is not the first time I've run into this - it's tripped me up at least a half dozen times over the past few years. It should not be this hard to determine what is going wrong
This is more or less related to your logging setup. I bet you configured Patroni to write logs directly to a files. In this case logs of any other tool or script still go to the stderr and end up somewhere else (syslog?). And you haven't checked it this place.
Perhaps including logging of the pg_basebackup error output would be in order as well, at least when log level is set to DEBUG.
Regarding capturing pg_basebackup logs - I am strongly against it, because for example it will break the --verbose
mode.
I have no desire for the password to be logged, and would hope that Patroni would not put it into the call itself as then it could be exposed via a process list. Does Patroni set the PGPASSWORD
environment variable for this? It's complicated to try to piece together the full pg_basebackup
command otherwise though, from some parameters that are explicitly defined, others which are defaulted, and at least the hostname being based on the current cluster state. I'd love a command to copy/paste and try running as the postgres system user, which would prompt for the password unless I first set PGPASSWORD
.
You are correct in that I've configured Patroni to log to discrete files. I did not think to check syslog, but looking back now, I do not see anything related logged there at all. As far as the syslog configuration, I'm using a default Ubuntu 20.04 configuration.
Lastly, part of the problem is that there's a bug in the Debian packaging wrapper that is used when running pg_basebackup
without an explicit path to the binary directory for the specific PostgreSQL version. If that path is included with the logged command executed, that would solve that confusion as well. I'll also get a bug report filed with the Debian package maintainers about that. FYI, the wrapper prints out an inaccurate and misleading error:
$ pg_basebackup --version
Error: /var/lib/postgresql/14/main is not accessible; please fix the directory permissions (/var/lib/postgresql/14/ should be world readable)
The actual binary from the Postgres packages does not throw that error for --version
, and furthermore does not state that the parent directory needs to be world-readable, which is not actually true.
I do not see anything related logged there at all.
That's very suspicious. Whatever pg_basebackup writes to stdout/stderr should end up in one or another file.
Lastly, part of the problem is that there's a bug in the Debian packaging wrapper that is used when running pg_basebackup without an explicit path to the binary directory for the specific PostgreSQL version. If that path is included with the logged command executed
It's very arguable about bug in Debian wrapper scripts. It seems that you neither set postgresql.bin_dir
nor made sure that postgres binaries are available in $PATH [1]. Such Debain wrapper scripts are available only for client binaries like psql
and pg_basebackup
, and not for postgres
, pg_ctl
, pg_rewind
and so on. Even if pg_basebackup succeed everything would fail because Patroni can't find the postgres
binary.
If that path is included with the logged command executed, that would solve that confusion as well
That's the thing. If path isn't specified in the Patroni config, the binary is executed from the default $PATH.
What makes things even worse - you didn't show the entire Patroni configuration file. Lack of postgresql.bin_dir
is an error that is very easy to spot.
[1] https://patroni.readthedocs.io/en/latest/yaml_configuration.html#postgresql
bin_dir: (optional) Path to PostgreSQL binaries (pg_ctl, initdb, pg_controldata, pg_basebackup, postgres, pg_isready, pg_rewind). If not provided or is an empty string, PATH environment variable will be used to find the executables.
I do set postgresql.bindir, but I did not think to include that when attempting to replicate manually at first. Another good reason for having the entire generated command logged. I can update this ticket tomorrow with my entire configuration, sans passwords.
@caseyandgina I don't mind adding a debug log for that. Please open a PR.
@CyberDem0n I updated the ticket to contain a mostly-complete Patroni YML, with sensitive information redacted. I'm not a developer and am not sure how to make a PR, unfortunately.