Removes high-cardinality labels from histogram metrics
This is to solve https://github.com/actions/actions-runner-controller/issues/3153
This removes runner_id, runner_name, and job_workflow_ref from the job_startup_duration_seconds and job_execution_duration_seconds metrics to reduce cardinality and allow histograms to be produced from them, with the idea that startup and execution data will be stored in "per repo + workflow" buckets.
I'm unsure whether removing labelKeyJobWorkflowRef from jobLabels is suitable or if this should be reworked more to come up with more suitable lists.
Any nudge on this? Is there anything you'd like me to do to get this sorted?
I would love to see this change, it would make tracking workflow execution times in Grafana much easier 🙏
plus one, please review and accept this PR. Emitting high cardinality metrics like this is explicitly discouraged by the prometheus/client_golang maintainers. This also effectively leads to unbounded memory growth unless pods are restarted.
https://github.com/prometheus/client_golang/issues/748 https://github.com/prometheus/client_golang/discussions/920
Closing this one since we introduced configurable metrics. Thank you for creating the PR!