barman
barman copied to clipboard
Barman long connection timeout to offline or not existing database
Hello,
We have an issue with long connection timeout to a database which is offline or not exists. It affects commands "barman check server" and "barman diagnose".
For example: "barman check" runs about 2 minutes:
$ time barman check barmandb
Server barmandb:
WAL archive: FAILED (please make sure WAL shipping is setup)
check timeout: FAILED (barman check command timed out)
real 2m11.233s
user 0m0.092s
sys 0m0.023s
And some more info in python profiler's log:
$ python -m cProfile -o /tmp/barman_cprofile.log /usr/bin/barman check barmandb
Server barmandb:
WAL archive: FAILED (please make sure WAL shipping is setup)
check timeout: FAILED (barman check command timed out)
Thu Nov 15 11:06:33 2018 barman_cprofile.log
306149 function calls (304162 primitive calls) in 131.972 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 131.973 131.973 /usr/bin/barman:22(<module>)
1 0.000 0.000 131.914 131.914 /usr/lib/python2.7/dist-packages/barman/cli.py:1145(main)
1 0.000 0.000 131.908 131.908 /usr/lib/python2.7/dist-packages/argh/helpers.py:53(dispatch)
1 0.000 0.000 131.908 131.908 /usr/lib/python2.7/dist-packages/argh/dispatching.py:65(dispatch)
1 0.000 0.000 131.908 131.908 /usr/lib/python2.7/dist-packages/argh/dispatching.py:210(_execute_command)
1 0.000 0.000 131.902 131.902 /usr/lib/python2.7/dist-packages/argh/dispatching.py:228(_call)
1 0.000 0.000 131.902 131.902 /usr/lib/python2.7/dist-packages/barman/cli.py:602(check)
1 0.000 0.000 131.827 131.827 /usr/lib/python2.7/dist-packages/barman/server.py:418(check)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/server.py:550(check_postgres)
2/1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/remote_status.py:57(get_remote_status)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/server.py:863(fetch_remote_status)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/postgres.py:678(fetch_remote_status)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/postgres.py:213(server_version)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/postgres.py:421(connect)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/barman/postgres.py:147(connect)
1 0.000 0.000 131.826 131.826 /usr/lib/python2.7/dist-packages/psycopg2/__init__.py:85(connect)
1 131.826 131.826 131.826 131.826 {psycopg2._psycopg._connect}
1 0.000 0.000 0.075 0.075 /usr/lib/python2.7/dist-packages/barman/cli.py:997(get_server_list)
1 0.000 0.000 0.065 0.065 /usr/lib/python2.7/dist-packages/barman/config.py:790(server_names)
System: Debian stretch in kubernetes pod Barman version: 2.5 python-psycopg2: 2.7.5-2.pgdg90+1
Found that connection timeout for "psycopg2" can be configured by adding Postgres connection parameter "connect_timeout" https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
For example if connect_timeout is 2 seconds, check takes about 14 seconds:
conninfo = "host=barman user=barman dbname=barman connect_timeout=2"
streaming_conninfo = "host=barman user=streaming_barman dbname=barman connect_timeout=2"
# time barman check barmandb
Server barmandb:
WAL archive: FAILED (please make sure WAL shipping is setup)
PostgreSQL: FAILED
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 0 backups, expected at least 0)
pg_basebackup: OK
pg_basebackup compatible: FAILED (PostgreSQL version: None, pg_basebackup version: 11.1-1.pgdg90+1))
pg_receivexlog: OK
pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 11.1-1.pgdg90+1))
receive-wal running: FAILED (See the Barman log file for more details)
archiver errors: OK
real 0m14.638s
user 0m0.324s
sys 0m0.208s
And if connect_timeout is 5 seconds, Barman uses its option "check_timeout" with default value 30 seconds and check takes about 30 seconds
conninfo = "host=barman user=barman dbname=barman connect_timeout=5"
streaming_conninfo = "host=barman user=streaming_barman dbname=barman connect_timeout=5"
# time barman check barmandb
Server barmandb:
WAL archive: FAILED (please make sure WAL shipping is setup)
PostgreSQL: FAILED
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 0 backups, expected at least 0)
pg_basebackup: OK
pg_basebackup compatible: FAILED (PostgreSQL version: None, pg_basebackup version: 11.1-1.pgdg90+1))
check timeout: FAILED (barman check command timed out)
real 0m31.112s
user 0m0.308s
sys 0m0.224s
Hi @mvk15 - just working through the backlog of issues and wanted to query this one, is there still an issue here given it is possible to specify the connection_timeout
in the conninfo?
It does seems odd that the barman_check
timeout does not fire at the expected 30s in the first case you report, where there is no connection_timeout
.