prometheus-barman-exporter icon indicating copy to clipboard operation
prometheus-barman-exporter copied to clipboard

There are two processes after long time running and barman_metrics_update is stuck

Open silencej opened this issue 3 years ago • 8 comments

After a long time running, barman_metrics_update keeps to be the old value.

   56     1 root     S    23432   1%   1   0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 15
19844    56 root     S    23440   1%   0   0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 15

If I use -c 3600 it will work as normal. So I guess it's because in the case of a short cache_time, barman_exporter.py spawns another subprocess and confuses the prometheus scraper.

silencej avatar May 11 '21 09:05 silencej

The fix cannot fix it. @ahes Do you have any idea why the barman_exporter process will spawn itself?

silencej avatar Jun 28 '21 09:06 silencej

Add logging to trace. Here is the debugging output:

2021-06-29 09:24:47,105 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: started process
2021-06-29 09:24:47,106 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: parsed stdin as a None
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamWriter object at 0x7f5951eb8fa0> ready for more input
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: done reading
2021-06-29 09:24:47,107 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: closing, but flushing first
2021-06-29 09:24:47,108 debug:645 61:140021602597664 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdin: got chunk size 0 to flush: b''
2021-06-29 09:24:47,108 info:642 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023', pid 8391>: process started
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: acquiring wait lock to wait for completion
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: got wait lock
2021-06-29 09:24:47,108 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: exit code not set, waiting on pid
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 1024: b'{"my_db'
2021-06-29 09:24:47,466 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,467 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 386: b'p": "20210627T150024", "retent'
2021-06-29 09:24:47,486 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e865e0> ready to be read from
2021-06-29 09:24:47,487 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got errno 5, done reading
2021-06-29 09:24:47,487 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>: <sh.StreamReader object at 0x7f5951e864f0> ready to be read from
2021-06-29 09:24:47,488 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: got chunk size 1410 to flush: b'{"my_db'
2021-06-29 09:24:47,488 debug:645 61:140021599111968 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023'>.<Process 8391 [b'/usr/bin/barman', b'-f', b'json', b'show-backup', b'my_db', b'20210628T150023']>.stdout: putting chunk onto pipe: b'{"my_db'
2021-06-29 09:24:47,489 debug:645 61:140021604444960 <Command '/usr/bin/barman -f json show-backup my_db 20210628T150023', pid 8391>: process completed

And top:

  61     1 root     S    23776   1%   1   0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 10
 8391    61 root     S    24820   1%   1   0% python3 /barman_exporter.py -l 0.0.0.0:9780 -c 10

Therefore, the extra subprocess is from python sh module.

silencej avatar Jun 30 '21 01:06 silencej

A workaround seems working now:

class BarmanCollectorCache:
    def __init__(self, barman, servers, cache_time):
        self.barman = barman
        self.servers = servers
#        self.cache_time = cache_time
        self._collect = []
        # self.start_collect_thread()

#    def start_collect_thread(self):
#        t = threading.Thread(target=self.collect_loop)
#        t.daemon = True
#        t.start()

    def collect_loop(self):
            logging.info("collect_loop----------")
#        while True:
            try:
                barman_collector = BarmanCollector(self.barman, self.servers)
                self._collect = list(barman_collector.collect())
#                time.sleep(self.cache_time)
            except Exception as e:
                logging.error("%s: Line %s: %s\n" % (type(e).__name__, e.__traceback__.tb_lineno, str(e)))

def start_exporter_service(args):
    try:
        addr, port = args.listen_address.split(":")
    except ValueError as e:
        raise ValueError("Incorrect '--listen-address' value: '{}'.".format(
            args.listen_address), "Use HOST:PORT.") from e

    registry = BarmanCollectorCache(Barman(), args.servers, args.cache_time)
    core.REGISTRY.register(registry)

    print("Listening on " + args.listen_address)
    prometheus_client.start_http_server(int(port), addr)

    while True:
#        time.sleep(1)
        registry.collect_loop()
        time.sleep(args.cache_time)

The idea is to save the thread running barman_cli, instead run barman_cli in the outer waiting loop (after prometheus_client.start_http_server).

silencej avatar Jun 30 '21 06:06 silencej

The previous solution still fails.

I plan to use subprocess to replace sh. Now still testing.

silencej avatar Jul 07 '21 09:07 silencej

@silencej Why would you want to run barman-exporter every 15 seconds? This is not enough time to run all barman CLI commands to get data about the backups. Probably that's the reason the second process is spawning.

marcinhlybin avatar Jul 09 '21 10:07 marcinhlybin

@silencej Do you manage barman-exporter with systemd? I don't see how the second process could be started. Short cache time shouldn't be the reason because the thread is waiting for data from barman collector anyway before sleeping.

I will run barman-exporter with 15 second cache time in my setup for 3 weeks and see if additional process appears. I am using systemd for running the barman-exporter.

marcinhlybin avatar Jul 09 '21 14:07 marcinhlybin

I run it with supervisord tgt with barman in a container.

On Fri, Jul 9, 2021 at 10:53 PM marcinhlybin @.***> wrote:

@silencej https://github.com/silencej Do you manage barman-exporter with systemd? I don't see how the second process could be started. Short cache time shouldn't be the reason because the thread is waiting for data from barman collector anyway before sleeping.

I will run barman-exporter with 15 second cache time in my setup for 3 weeks and see if additional process appears. I am using systemd for running the barman-exporter.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcinhlybin/prometheus-barman-exporter/issues/15#issuecomment-877246222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDENC644DXYONMCCRJEYDTW4ENJANCNFSM44U5UXHQ .

--

/*************************************************************************** ** Wang Chaofeng ** Software Engineer, Algorithm Engineer, Architect ** Shanghai, China ** http://www.wangchaofeng.com/ ***************************************************************************/

silencej avatar Jul 10 '21 02:07 silencej

I have a PR https://github.com/marcinhlybin/prometheus-barman-exporter/pull/16/files

I have run it for more than 3 days and the issue doesn't happen in-between. I use "subprocess" to replace "sh" module.

silencej avatar Jul 12 '21 08:07 silencej