tcp-shaker Add daemon mode which exposes prometheus metrics

Add daemon mode which exposes prometheus metrics

Open IceWreck opened this issue 1 year ago • 1 comments

Prometheus format is kinda the de-facto standard for emitting metrics these days. We wanted to record latency and do healthchecks + draw graphs of variations in grafana and alert if something is out of place. So we added a daemon mode to tcp-shaker which runs the checker at regular intervals and runs an HTTP endpoint which exposes metrics.

CLI/one-off mode is still the default.
Daemon mode reads daemon config from a yaml file. This yaml file has a list of TCP address to check + options specific to daemon mode. Use CLI arguments for options that are common between CLI and Daemon modes.
Config has been renamed to CLIConfig to prevent confusion with DaemonConfig.
Concurrent checker now accepts address as parameter instead of directly using the one in CLI Config.
Some misc changes

Example yaml file at app/tcp-checker/example.yaml

Test it with go run ./app/tcp-checker/ -d -f ./app/tcp-checker/example.yaml, wait for a couple of seconds then visit http://localhost:8785/metrics

They look something like this

# HELP error_count Number of errors occurred, partitioned by error type, destination address and number of requests per check.
# TYPE error_count counter
error_count{destination="example.com:443",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="timeout",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="timeout",requests_per_check="1"} 7
error_count{destination="google.com:80",error_type="connect",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="other",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="timeout",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="connect",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="other",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="timeout",requests_per_check="1"} 0
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP tcpcheck_duration TCP Check duration in ms, partitioned by destination address and number of requests per check.
# TYPE tcpcheck_duration gauge
tcpcheck_duration{destination="example.com:443",requests_per_check="1"} 256
tcpcheck_duration{destination="example.com:5454",requests_per_check="1"} 1000
tcpcheck_duration{destination="google.com:80",requests_per_check="1"} 57
tcpcheck_duration{destination="smtp.gmail.com:465",requests_per_check="1"} 82

Mar 30 '23 14:03 IceWreck

I believe this PR could be a great contribution to the TCP prober of the official blackbox exporter.

May 17 '23 03:05 tevino

tcp-shaker tcp-shaker copied to clipboard

Add daemon mode which exposes prometheus metrics

tcp-shaker
tcp-shaker copied to clipboard