metrics: scope metrics to active config, add optional per-host metrics
This PR builds atop @hussam-almarzoq's work in PR #6279, addresses the comments, and scopes the metrics per active config. The admin metrics are retained and not reset per config reload because they're instance/process-oriented. The HTTP metrics are reset at config reload, mostly because it's hard to know the diff to add/remove at reload, especially because radical config change may render previous metrics meaningless.
This also adds 2 more metrics:
-
caddy_config_last_reload_success_timestamp_seconds: records the timestamp of the last successful load -
caddy_config_last_reload_successful: records whether the last config load was successful or not
Supersede #6279 Closes #3784
I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?
I will defer to others more proficient with the metrics libs, but if it works and doesn't cause significant regressions in performance, LGTM.
I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?
There may be a small uptick in memory utilization, but should be negligible according to the discussion on https://github.com/caddyserver/caddy/issues/4644#issuecomment-2260524189. I don't expect the code to be any slower than what's currently experienced in #4644.
I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (collectors.NewProcessCollector, collectors.NewProcessCollector from github.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should the metrics endpoint omit these now?
I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (
collectors.NewProcessCollector,collectors.NewProcessCollectorfromgithub.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should themetricsendpoint omit these now?
I don't think that collector was ever used before. Why'd you assume it's included?
I saw the metrics before in v2.8.4. Things like go_info, go_goroutines, go_threads, go_gc_duration_seconds or process_cpu_seconds_total.
Maybe before the default registry was used so these 2 were always registered via this init() function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63
I saw the metrics before in v2.8.4. Things like
go_info,go_goroutines,go_threads,go_gc_duration_secondsorprocess_cpu_seconds_total. Maybe before the default registry was used so these 2 were always registered via thisinit()function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63
Ah, I'll fix it shortly. Thanks for the catch and the heads up!