iptables_exporter
iptables_exporter copied to clipboard
Shutdown if iptables-save fails
Hi,
Thanks for an interesting exporter. We are currently investigating using it for one of our deployments.
When developing the Ansible role for deploying it to a server, I discovered that the error handling works in a slightly unusual way: if the capabilities for the process described in https://github.com/retailnext/iptables_exporter#required-permissions are not available, each request to /metrics
will result in an exit status 1
error being logged to the syslog. The process will remain running.
I think this is not perhaps an ideal error handling strategy. It would be better if iptables-save
would be called right on start, and if it fails, abort the process (exit with a non-zero return code). That way, the semantics are much more clearer to the user that "something is not working as it should".
Your thoughts on this, @eriksw?
Nov 6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Starting iptables_exporter (version=0.1.0, branch=master, revision=9172cd2acb4afc4ba77d48c85a46fbef36d5cb80)" source="iptables_exporter.go:153"
Nov 6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Build context (go=go1.10.3, user=root@f2efa59b1ef9, date=20180628-04:29:29)" source="iptables_exporter.go:154"
Nov 6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Listening on 0.0.0.0:9455" source="iptables_exporter.go:170"
Nov 6 13:43:46 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:46Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
Nov 6 13:44:23 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:44:23Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
Nov 6 13:44:24 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:44:24Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
One concern I have with exiting out is if there are any circumstances where an exit status of 1 from iptables-save could be transient, but I think it's entirely reasonable to expect that the exporter is run from something that will restart it.
A related thought I have is that iptables_scrape_success
might not be a good idea, that it might be better if the request from Prometheus gets an http error response, so that the exporter being in an unhealthy state (unable to run iptables-save
) will trip a generic "master caution" alert (up == 0
) within Prometheus to draw attention to the issue.
This project isn't a high priority for me at the moment, but I'd be happy to accept a PR improving on this in either or both ways. (Also eratta like switching to go modules and bringing in a more up to date prometheus lib, etc...)
Thanks for a prompt reply. Yes, we run the exporter from systemd (like I presume most people would do), so automatic restarts are simple in that sense.
A related thought I have is that iptables_scrape_success might not be a good idea, that it might be better if the request from Prometheus gets an http error response, so that the exporter being in an unhealthy state (unable to run iptables-save) will trip a generic "master caution" alert (up == 0) within Prometheus to draw attention to the issue.
Yep, that also sounds reasonable. (We have alerts for that specific kind of scenario, as a generic fallback in case we don't have more specific alerts for a given service)
This project isn't a high priority for me at the moment, but I'd be happy to accept a PR improving on this in either or both ways.
Would be an interesting experiment, but I don't really have really any Go experience from beforehand, and it's not really very high on my priority list either. :slightly_smiling_face: (= other work is more important at the moment) Given that the exporter typically works (if the capabilities are there), this is mostly for me about being able to properly handle edge cases.
I'd suggest we leave the issue open for some time, if not for anything else that it keeps track of this discussion having taken place. Maybe it will scratch an itch for someone else with more time/urgency to get this improved. :innocent:
Thanks @eriksw for that exporter. I was about to write an uggly shellscript myself, when I found this project.
Concerning the issue, I may add, that the semantics of prometheus exporters have never been, to change the HTTP-returncode, when an error occurs for the scraping.
Signaling config- or scrape- errors in a dedicated metric is what node_exporter does for example (ie for textcollector metrics).
Having an error metric and a last-successfull-scrape metric would be suggestion, if anyone diggs into that.