caddy icon indicating copy to clipboard operation
caddy copied to clipboard

metrics: scope metrics to active config, add optional per-host metrics

Open mohammed90 opened this issue 1 year ago • 2 comments

This PR builds atop @hussam-almarzoq's work in PR #6279, addresses the comments, and scopes the metrics per active config. The admin metrics are retained and not reset per config reload because they're instance/process-oriented. The HTTP metrics are reset at config reload, mostly because it's hard to know the diff to add/remove at reload, especially because radical config change may render previous metrics meaningless.

This also adds 2 more metrics:

  • caddy_config_last_reload_success_timestamp_seconds: records the timestamp of the last successful load
  • caddy_config_last_reload_successful: records whether the last config load was successful or not

Supersede #6279 Closes #3784

mohammed90 avatar Aug 20 '24 20:08 mohammed90

I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?

I will defer to others more proficient with the metrics libs, but if it works and doesn't cause significant regressions in performance, LGTM.

mholt avatar Aug 23 '24 16:08 mholt

I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?

There may be a small uptick in memory utilization, but should be negligible according to the discussion on https://github.com/caddyserver/caddy/issues/4644#issuecomment-2260524189. I don't expect the code to be any slower than what's currently experienced in #4644.

mohammed90 avatar Aug 23 '24 16:08 mohammed90

I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (collectors.NewProcessCollector, collectors.NewProcessCollector from github.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should the metrics endpoint omit these now?

ueffel avatar Nov 22 '24 21:11 ueffel

I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (collectors.NewProcessCollector, collectors.NewProcessCollector from github.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should the metrics endpoint omit these now?

I don't think that collector was ever used before. Why'd you assume it's included?

mohammed90 avatar Nov 22 '24 21:11 mohammed90

I saw the metrics before in v2.8.4. Things like go_info, go_goroutines, go_threads, go_gc_duration_seconds or process_cpu_seconds_total. Maybe before the default registry was used so these 2 were always registered via this init() function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63

ueffel avatar Nov 22 '24 22:11 ueffel

I saw the metrics before in v2.8.4. Things like go_info, go_goroutines, go_threads, go_gc_duration_seconds or process_cpu_seconds_total. Maybe before the default registry was used so these 2 were always registered via this init() function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63

Ah, I'll fix it shortly. Thanks for the catch and the heads up!

mohammed90 avatar Nov 22 '24 22:11 mohammed90