ansible-blackbox-exporter icon indicating copy to clipboard operation
ansible-blackbox-exporter copied to clipboard

blackbox_exporter is reaching the nofile limit

Open jqueuniet opened this issue 4 years ago • 1 comments

What happened?

Lots of too many open files errors if too many blackbox checks are added to Prometheus

Mar 12 03:03:07 prom1 blackbox_exporter[37552]: 2020/03/12 03:03:07 http: Accept error: accept tcp 127.0.0.1:9115: accept4: too many open files; retrying in 20ms

Did you expect to see some different?

Checks are done without error

How to reproduce it (as minimally and precisely as possible):

Add a few thousand blackbox checks.

Environment

The server is running Debian 10, the default nofile soft limit set by systemd seems to be 1024, hard limit is 524288.

Adding LimitNOFILE=65000 like in the Prometheus Ansible role to the blackbox_exporter systemd unit solves the problem permanently.

jqueuniet avatar Mar 12 '20 14:03 jqueuniet

Seems like we are not setting LimitNOFILE in systemd service file, so it should be inherited from environment. In such case I don't think this is this role problem and it can be adjusted by using systemd "drop-in directory" (more in https://www.freedesktop.org/software/systemd/man/systemd.unit.html)

@SuperQ I see gitlab had some incident similar to this issue. Can you advise on some sensible value for LimitNOFILE?

paulfantom avatar Mar 12 '20 16:03 paulfantom

This role has been deprecated in favor of a the prometheus-community/ansible collection.

SuperQ avatar Mar 06 '23 15:03 SuperQ