Add Prometheus Metric for number of jobs waiting for self-hosted runners
What would you like added?
Add a Prometheus metric github_workflow_jobs_in_queue to track the number of jobs waiting for self-hosted runners. This metric will help to monitor and analyse the performance of the infrastructure to add pods in time.
Why is this needed?
This metric is needed to provide better visibility of the infrastructure how fast it reacts on job requests. Also, if the queue is high, then it means that jobs are not started timely, or the system cannot deal in time with the number of requests. This is recognised by the users.
Having a large queue give the administrator a clear indicator of jobs not assigned to pods in time.
Additional context
none
I was looking into this as well, but isn't what the github_workflow_jobs_queued_total is supposed to do?
Thanks. Most probably that is it. We will check it.
This metric is not present in the listener helm chart. I'm not sure how we can use this github_workflow_jobs_queued_total