tcp-shaker
tcp-shaker copied to clipboard
Add daemon mode which exposes prometheus metrics
Prometheus format is kinda the de-facto standard for emitting metrics these days. We wanted to record latency and do healthchecks + draw graphs of variations in grafana and alert if something is out of place. So we added a daemon mode to tcp-shaker which runs the checker at regular intervals and runs an HTTP endpoint which exposes metrics.
- CLI/one-off mode is still the default.
- Daemon mode reads daemon config from a yaml file. This yaml file has a list of TCP address to check + options specific to daemon mode. Use CLI arguments for options that are common between CLI and Daemon modes.
- Config has been renamed to CLIConfig to prevent confusion with DaemonConfig.
- Concurrent checker now accepts address as parameter instead of directly using the one in CLI Config.
- Some misc changes
Example yaml file at app/tcp-checker/example.yaml
Test it with go run ./app/tcp-checker/ -d -f ./app/tcp-checker/example.yaml
, wait for a couple of seconds then visit http://localhost:8785/metrics
They look something like this
# HELP error_count Number of errors occurred, partitioned by error type, destination address and number of requests per check.
# TYPE error_count counter
error_count{destination="example.com:443",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="timeout",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="timeout",requests_per_check="1"} 7
error_count{destination="google.com:80",error_type="connect",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="other",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="timeout",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="connect",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="other",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="timeout",requests_per_check="1"} 0
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP tcpcheck_duration TCP Check duration in ms, partitioned by destination address and number of requests per check.
# TYPE tcpcheck_duration gauge
tcpcheck_duration{destination="example.com:443",requests_per_check="1"} 256
tcpcheck_duration{destination="example.com:5454",requests_per_check="1"} 1000
tcpcheck_duration{destination="google.com:80",requests_per_check="1"} 57
tcpcheck_duration{destination="smtp.gmail.com:465",requests_per_check="1"} 82
I believe this PR could be a great contribution to the TCP prober of the official blackbox exporter.