tcp-shaker icon indicating copy to clipboard operation
tcp-shaker copied to clipboard

Add daemon mode which exposes prometheus metrics

Open IceWreck opened this issue 1 year ago • 1 comments

Prometheus format is kinda the de-facto standard for emitting metrics these days. We wanted to record latency and do healthchecks + draw graphs of variations in grafana and alert if something is out of place. So we added a daemon mode to tcp-shaker which runs the checker at regular intervals and runs an HTTP endpoint which exposes metrics.

  • CLI/one-off mode is still the default.
  • Daemon mode reads daemon config from a yaml file. This yaml file has a list of TCP address to check + options specific to daemon mode. Use CLI arguments for options that are common between CLI and Daemon modes.
  • Config has been renamed to CLIConfig to prevent confusion with DaemonConfig.
  • Concurrent checker now accepts address as parameter instead of directly using the one in CLI Config.
  • Some misc changes

Example yaml file at app/tcp-checker/example.yaml

Test it with go run ./app/tcp-checker/ -d -f ./app/tcp-checker/example.yaml, wait for a couple of seconds then visit http://localhost:8785/metrics

They look something like this

# HELP error_count Number of errors occurred, partitioned by error type, destination address and number of requests per check.
# TYPE error_count counter
error_count{destination="example.com:443",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="timeout",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="timeout",requests_per_check="1"} 7
error_count{destination="google.com:80",error_type="connect",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="other",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="timeout",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="connect",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="other",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="timeout",requests_per_check="1"} 0
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP tcpcheck_duration TCP Check duration in ms, partitioned by destination address and number of requests per check.
# TYPE tcpcheck_duration gauge
tcpcheck_duration{destination="example.com:443",requests_per_check="1"} 256
tcpcheck_duration{destination="example.com:5454",requests_per_check="1"} 1000
tcpcheck_duration{destination="google.com:80",requests_per_check="1"} 57
tcpcheck_duration{destination="smtp.gmail.com:465",requests_per_check="1"} 82

IceWreck avatar Mar 30 '23 14:03 IceWreck

I believe this PR could be a great contribution to the TCP prober of the official blackbox exporter.

tevino avatar May 17 '23 03:05 tevino