actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Incorrect reporting of histogram type metrics from listener pod

Open 1MaxKoval opened this issue 1 year ago • 3 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.7.0

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

  1. Install controller chart (see controller.yaml attached below)
NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    -f controller.yaml \
    --version "0.7.0" \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
  1. Install runner chart (see runner-set.yaml attached below)
INSTALLATION_NAME="arc-runner-set"
NAMESPACE="arc-runners"
helm install "${INSTALLATION_NAME}" \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    -f runner-set.yaml \
    --version "0.7.0" \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
  1. Once the controller/listener/runner pods are in a ready state, schedule a job from GH and wait for completion

Here is the workflow code that was used for the test:

name: ARC test
on:
  workflow_dispatch:
    inputs:
      sleepTime:
        description: "Seconds to sleep for"
        default: 2

jobs:
  print:
    runs-on: "test-runner-set"
    container:
        image: docker.io/busybox:latest
    steps:
    - run: sleep ${{ github.event.inputs.sleepTime }}

Note: A potentially important detail is that the organisation that I work at uses Github Enterprise Server (i.e. during testing the job was sourced from an instance of Github Enterprise Server)

  1. Open another terminal window and port-forward to the metrics port of the listener
kubectl port-forward <listener-pod-name> <your-local-port>:8080
  1. Observe the emitted metrics

While the port-forward command is running, make a http request to the metrics endpoint through e.g. curl:

curl http://localhost:<your-local-port>/metrics
  1. If you schedule more jobs to this runner-set (using the same workflow code) with varying execution times you should notice that all metrics ending with _bucket get incremented regardless of actual execution duration.

Describe the bug

The listener pods output histograms with frequency buckets as per the syntax of prometheus exposition formats. However, the values assigned to those buckets seem to be incorrect. It appears that each bucket gets incremented regardless of the actual job execution/startup time.

After the runner-set completes a single job with execution duration of 2 seconds, the listener pod metric endpoint outputs the following:

# HELP gha_job_execution_duration_seconds Time spent executing workflow jobs by the scale set (in seconds).
# TYPE gha_job_execution_duration_seconds histogram
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.01"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.05"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.1"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.5"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="1"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="2"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="3"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="4"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="5"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="6"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="7"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="8"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="9"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="10"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="12"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="15"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="18"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="20"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="25"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="30"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="40"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="50"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="60"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="70"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="80"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="90"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="100"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="110"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="120"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="150"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="180"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="210"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="240"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="300"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="360"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="420"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="480"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="540"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="600"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="900"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="1200"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="1800"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="2400"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="3000"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="3600"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="+Inf"} 1
gha_job_execution_duration_seconds_sum{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository=""} 0
gha_job_execution_duration_seconds_count{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository=""} 1

The case is also the same for the startup duration histogram:

# HELP gha_job_startup_duration_seconds Time spent waiting for workflow job to get started on the runner owned by the scale set (in seconds).
# TYPE gha_job_startup_duration_seconds histogram
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="0.01"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="0.05"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="0.1"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="0.5"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="1"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="2"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="3"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="4"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="5"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="6"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="7"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="8"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="9"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="10"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="12"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="15"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="18"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="20"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="25"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="30"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="40"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="50"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="60"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="70"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="80"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="90"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="100"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="110"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="120"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="150"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="180"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="210"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="240"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="300"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="360"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="420"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="480"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="540"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="600"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="900"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="1200"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="1800"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="2400"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="3000"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="3600"} 1
gha_job_startup_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository="",le="+Inf"} 1
gha_job_startup_duration_seconds_sum{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository=""} 0
gha_job_startup_duration_seconds_count{enterprise="<your-enterprise>",event_name="workflow_dispatch",organization="",repository=""} 1

Describe the expected behavior

Only the right buckets should be incremented per job execution that corresponds to the execution/startup time of the job.

Example:

On a new listener pod, after a completion of a single job with execution time of 2 seconds, the emitted buckets metrics should be as follows:

# HELP gha_job_execution_duration_seconds Time spent executing workflow jobs by the scale set (in seconds).
# TYPE gha_job_execution_duration_seconds histogram
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.01"} 0
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.05"} 0
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.1"} 0
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="0.5"} 0
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="1"} 0
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="2"} 1 <------ <Expected increment start>
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="3"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="4"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="5"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="6"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="7"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="8"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="9"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="10"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="12"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="15"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="18"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="20"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="25"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="30"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="40"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="50"} 1
gha_job_execution_duration_seconds_bucket{enterprise="<your-enterprise>",event_name="workflow_dispatch",job_name="print",job_result="succeeded",job_workflow_ref="<your-workflow-ref>",organization="",repository="",le="60"} 1

.... (redacted for brevity)

Additional Context

## controller.yaml

metrics:
  controllerManagerAddr: ":8080" 
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

## runner-set.yaml

runnerScaleSetName: test-runner-set
githubConfigUrl: "https://<your-github-host>/enterprises/<your-enterprise>"
githubConfigSecret:
  github_token: <gh-token>

minRunners: 1
maxRunners: 1

containerMode:
  type: "dind"

Controller Logs

https://gist.github.com/1MaxKoval/7c875cc4486810e444e5b23b22512802

p.s. you can also find the listener logs and Prometheus endpoint output there

Runner Pod Logs

https://gist.github.com/1MaxKoval/8b74b22d4689c5906ff290a436ced41b

1MaxKoval avatar Dec 20 '23 14:12 1MaxKoval

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Dec 20 '23 14:12 github-actions[bot]

For context, I noted a similar issue (in my comment here)

  • https://github.com/actions/actions-runner-controller/pull/3003#issuecomment-1846138389

I haven't yet confirmed though if what you're getting here is what I'm also seeing, and whether that's what's causing me being surprised by too high cardinality on gha_job_execution_duration_seconds 🤷

MPV avatar Jan 10 '24 11:01 MPV

Hey, are you by any chance running the enterprise server 3.9?

nikola-jokic avatar Apr 26 '24 11:04 nikola-jokic

@nikola-jokic I'm still seeing this exact issue with the controller and scale set version 0.9.1 on GHE 3.10.3.

chotiwat avatar May 10 '24 21:05 chotiwat

I dug into it a bit and I can't find these time fields being emitted anywhere: https://github.com/actions/actions-runner-controller/blob/a1b8e0cc3d280cfae73a4c1dc24dc49da371d1d1/github/actions/types.go#L66-L69

Are they supposed to be in the EphemeralRunnerStatus?

https://github.com/actions/actions-runner-controller/blob/a1b8e0cc3d280cfae73a4c1dc24dc49da371d1d1/apis/actions.github.com/v1alpha1/ephemeralrunner_types.go#L76-L124

The EphemeralRunnerStatus doesn't seem to get updated for the JobCompleted case.

https://github.com/actions/actions-runner-controller/blob/a1b8e0cc3d280cfae73a4c1dc24dc49da371d1d1/cmd/githubrunnerscalesetlistener/autoScalerService.go#L163-L193

I'm not familiar with the code base so I might be completely wrong though.

For context, we are migrating from runner deployments to autoscaling runner sets. These histogram metrics would be very helpful for replacing our custom run duration metrics computed from the GitHub APIs.

chotiwat avatar May 10 '24 21:05 chotiwat

Hey @chotiwat,

Starting from 3.11, the histogram metrics are available. Otherwise, these fields are not communicated to the scale set, so they are always set to 0. I have created a docs PR to document this behavior.

nikola-jokic avatar May 13 '24 09:05 nikola-jokic

Thank you all for raising this issue. Docs updates are in so I will close it now :relaxed:. Sorry this hasn't been documented before :disappointed:

nikola-jokic avatar May 14 '24 08:05 nikola-jokic