prometheus-barman-exporter
prometheus-barman-exporter copied to clipboard
There are two processes after long time running and barman_metrics_update is stuck
After a long time running, barman_metrics_update keeps to be the old value.
56 1 root S 23432 1% 1 0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 15
19844 56 root S 23440 1% 0 0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 15
If I use -c 3600
it will work as normal. So I guess it's because in the case of a short cache_time, barman_exporter.py spawns another subprocess and confuses the prometheus scraper.
The fix cannot fix it. @ahes Do you have any idea why the barman_exporter process will spawn itself?
Add logging to trace. Here is the debugging output:
2021-06-29 09:24:47,105 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: started process
2021-06-29 09:24:47,106 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: parsed stdin as a None
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamWriter object at 0x7f5951eb8fa0> ready for more input
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: done reading
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: closing, but flushing first
2021-06-29 09:24:47,108 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: got chunk size 0 to flush: b''
2021-06-29 09:24:47,108 info:642 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023', pid 8391>: process started
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: acquiring wait lock to wait for completion
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: got wait lock
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: exit code not set, waiting on pid
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 1024: b'{"my_db'
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,467 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 386: b'p": "20210627T150024", "retent'
2021-06-29 09:24:47,486 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,487 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got errno 5, done reading
2021-06-29 09:24:47,487 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e864f0> ready to be read from
2021-06-29 09:24:47,488 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 1410 to flush: b'{"my_db'
2021-06-29 09:24:47,488 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: putting chunk onto pipe: b'{"my_db'
2021-06-29 09:24:47,489 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023', pid 8391>: process completed
And top:
61 1 root S 23776 1% 1 0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 10
8391 61 root S 24820 1% 1 0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 10
Therefore, the extra subprocess is from python sh module.
A workaround seems working now:
class BarmanCollectorCache:
def __init__(self, barman, servers, cache_time):
self.barman = barman
self.servers = servers
# self.cache_time = cache_time
self._collect = []
# self.start_collect_thread()
# def start_collect_thread(self):
# t = threading.Thread(target=self.collect_loop)
# t.daemon = True
# t.start()
def collect_loop(self):
logging.info("collect_loop----------")
# while True:
try:
barman_collector = BarmanCollector(self.barman, self.servers)
self._collect = list(barman_collector.collect())
# time.sleep(self.cache_time)
except Exception as e:
logging.error("%s: Line %s: %s\n" % (type(e).__name__, e.__traceback__.tb_lineno, str(e)))
def start_exporter_service(args):
try:
addr, port = args.listen_address.split(":")
except ValueError as e:
raise ValueError("Incorrect '--listen-address' value: '{}'.".format(
args.listen_address), "Use HOST:PORT.") from e
registry = BarmanCollectorCache(Barman(), args.servers, args.cache_time)
core.REGISTRY.register(registry)
print("Listening on " + args.listen_address)
prometheus_client.start_http_server(int(port), addr)
while True:
# time.sleep(1)
registry.collect_loop()
time.sleep(args.cache_time)
The idea is to save the thread running barman_cli, instead run barman_cli in the outer waiting loop (after prometheus_client.start_http_server).
The previous solution still fails.
I plan to use subprocess to replace sh. Now still testing.
@silencej Why would you want to run barman-exporter every 15 seconds? This is not enough time to run all barman CLI commands to get data about the backups. Probably that's the reason the second process is spawning.
@silencej Do you manage barman-exporter with systemd? I don't see how the second process could be started. Short cache time shouldn't be the reason because the thread is waiting for data from barman collector anyway before sleeping.
I will run barman-exporter with 15 second cache time in my setup for 3 weeks and see if additional process appears. I am using systemd for running the barman-exporter.
I run it with supervisord tgt with barman in a container.
On Fri, Jul 9, 2021 at 10:53 PM marcinhlybin @.***> wrote:
@silencej https://github.com/silencej Do you manage barman-exporter with systemd? I don't see how the second process could be started. Short cache time shouldn't be the reason because the thread is waiting for data from barman collector anyway before sleeping.
I will run barman-exporter with 15 second cache time in my setup for 3 weeks and see if additional process appears. I am using systemd for running the barman-exporter.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcinhlybin/prometheus-barman-exporter/issues/15#issuecomment-877246222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDENC644DXYONMCCRJEYDTW4ENJANCNFSM44U5UXHQ .
--
/*************************************************************************** ** Wang Chaofeng ** Software Engineer, Algorithm Engineer, Architect ** Shanghai, China ** http://www.wangchaofeng.com/ ***************************************************************************/
I have a PR https://github.com/marcinhlybin/prometheus-barman-exporter/pull/16/files
I have run it for more than 3 days and the issue doesn't happen in-between. I use "subprocess" to replace "sh" module.