actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Removes high-cardinality labels from histogram metrics

Open thomassandslyst opened this issue 1 year ago • 2 comments

This is to solve https://github.com/actions/actions-runner-controller/issues/3153

This removes runner_id, runner_name, and job_workflow_ref from the job_startup_duration_seconds and job_execution_duration_seconds metrics to reduce cardinality and allow histograms to be produced from them, with the idea that startup and execution data will be stored in "per repo + workflow" buckets.

I'm unsure whether removing labelKeyJobWorkflowRef from jobLabels is suitable or if this should be reworked more to come up with more suitable lists.

thomassandslyst avatar May 30 '24 12:05 thomassandslyst

Any nudge on this? Is there anything you'd like me to do to get this sorted?

thomassandslyst avatar Jul 02 '24 10:07 thomassandslyst

I would love to see this change, it would make tracking workflow execution times in Grafana much easier 🙏

wwalters12 avatar Jul 15 '24 13:07 wwalters12

plus one, please review and accept this PR. Emitting high cardinality metrics like this is explicitly discouraged by the prometheus/client_golang maintainers. This also effectively leads to unbounded memory growth unless pods are restarted.

https://github.com/prometheus/client_golang/issues/748 https://github.com/prometheus/client_golang/discussions/920

mikespharss avatar Oct 24 '24 20:10 mikespharss

Closing this one since we introduced configurable metrics. Thank you for creating the PR!

nikola-jokic avatar May 07 '25 13:05 nikola-jokic