ansible-blackbox-exporter
ansible-blackbox-exporter copied to clipboard
blackbox_exporter is reaching the nofile limit
What happened?
Lots of too many open files
errors if too many blackbox checks are added to Prometheus
Mar 12 03:03:07 prom1 blackbox_exporter[37552]: 2020/03/12 03:03:07 http: Accept error: accept tcp 127.0.0.1:9115: accept4: too many open files; retrying in 20ms
Did you expect to see some different?
Checks are done without error
How to reproduce it (as minimally and precisely as possible):
Add a few thousand blackbox checks.
Environment
The server is running Debian 10, the default nofile
soft limit set by systemd seems to be 1024, hard limit is 524288.
Adding LimitNOFILE=65000
like in the Prometheus Ansible role to the blackbox_exporter systemd unit solves the problem permanently.
Seems like we are not setting LimitNOFILE in systemd service file, so it should be inherited from environment. In such case I don't think this is this role problem and it can be adjusted by using systemd "drop-in directory" (more in https://www.freedesktop.org/software/systemd/man/systemd.unit.html)
@SuperQ I see gitlab had some incident similar to this issue. Can you advise on some sensible value for LimitNOFILE
?
This role has been deprecated in favor of a the prometheus-community/ansible collection.