SummerWind Webhook autoscaler kills running jobs when `runs-on` is ambiguous

Open Nuru opened this issue 2 years ago • 1 comments

One consequence of this bug is that runners can be arbitrarily killed while running jobs. It happens in variations on a pattern like this:

You have 2 runner pools, A and B, with no idle runners configured.
You launch 6 jobs with runs-on: ["self-hosted"], meaning they can run on A or B. By chance, the webhook autoscaler scales up both A and B to 3 runners each, and the 6 jobs are picked up by the runners.
The 3 jobs on B finish, and the webhook gets 3 job completed events. Unfortunately, the webhook scales down A instead of B, meaning it kills the 3 jobs running in A, leaving them in an error state ("The self-hosted runner: xxx lost communication with the server. Verify the machine is running and has a healthy network connection...").
Although the 3 jobs on B have finished, the 3 reservations for the jobs have not been deleted, because B has not been scaled down, so 3 idle runners are provisioned in B until the their reservations expire.
The 3 jobs on A finish, but they also result in scale-down of A, which is already at zero, so the job completed events are effectively ignored.

End result: 3 jobs killed, 3 unwanted runners running idle.

Checks

[X] I've already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
[X] I'm not using a custom entrypoint in my runner image

Controller Version

0.27.4 (still happens with 0.27.6)

Helm Chart Version

0.23.3 (still happens with 0.23.7)

CertManager Version

1.10.2

Deployment Method

Helm

cert-manager installation

I'm certain cert-manager is working properly, we use it for other things.

Checks

[X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
[X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
[X] My actions-runner-controller version (v0.x.y) does support the feature
[X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
[X] I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

HorizontalRunnerAutoscaler

apiVersion: v1
items:
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: HorizontalRunnerAutoscaler
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-large
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-21T20:22:36Z"
    generation: 1429
    labels:
      app.kubernetes.io/managed-by: Helm
      k8slens-edit-resource-version: v1alpha1
    name: infra-runner-amd64-large
    namespace: actions-runner-system
    resourceVersion: "98605291"
    uid: ea9a0690-e332-4970-8262-a2bf8df39b68
  spec:
    maxReplicas: 8
    minReplicas: 0
    scaleDownDelaySecondsAfterScaleOut: 300
    scaleTargetRef:
      name: infra-runner-amd64-large
    scaleUpTriggers:
    - amount: 1
      duration: 3h60m
      githubEvent:
        workflowJob: {}
  status:
    desiredReplicas: 0
    lastSuccessfulScaleOutTime: "2023-08-05T23:10:58Z"
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: HorizontalRunnerAutoscaler
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-medium
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-19T23:54:18Z"
    generation: 1174
    labels:
      app.kubernetes.io/managed-by: Helm
      k8slens-edit-resource-version: v1alpha1
    name: infra-runner-amd64-medium
    namespace: actions-runner-system
    resourceVersion: "98607727"
    uid: f7aa1d23-c8ff-422d-8eb9-7df1eb66f454
  spec:
    maxReplicas: 8
    minReplicas: 0
    scaleDownDelaySecondsAfterScaleOut: 300
    scaleTargetRef:
      name: infra-runner-amd64-medium
    scaleUpTriggers:
    - amount: 1
      duration: 60m
      githubEvent:
        workflowJob: {}
  status:
    desiredReplicas: 0
    lastSuccessfulScaleOutTime: "2023-08-06T01:00:36Z"
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: HorizontalRunnerAutoscaler
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-small
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-19T23:54:18Z"
    generation: 6427
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-amd64-small
    namespace: actions-runner-system
    resourceVersion: "98602624"
    uid: 37cd150c-7509-46a4-a575-0084417fd9cc
  spec:
    capacityReservations:
    - effectiveTime: "2023-08-06T00:21:14Z"
      expirationTime: "2023-08-06T00:51:14Z"
      replicas: 1
    maxReplicas: 5
    minReplicas: 1
    scaleDownDelaySecondsAfterScaleOut: 300
    scaleTargetRef:
      name: infra-runner-amd64-small
    scaleUpTriggers:
    - amount: 1
      duration: 30m
      githubEvent:
        workflowJob: {}
  status:
    desiredReplicas: 1
    lastSuccessfulScaleOutTime: "2023-08-06T00:21:14Z"
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: HorizontalRunnerAutoscaler
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-arm64
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-03-28T03:21:51Z"
    generation: 4437
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-arm64
    namespace: actions-runner-system
    resourceVersion: "98588411"
    uid: 3e524b3a-5b26-4059-a22d-ec713ff309d2
  spec:
    maxReplicas: 128
    minReplicas: 0
    scaleDownDelaySecondsAfterScaleOut: 300
    scaleTargetRef:
      name: infra-runner-arm64
    scaleUpTriggers:
    - amount: 1
      duration: 45m
      githubEvent:
        workflowJob: {}
  status:
    desiredReplicas: 0
    lastSuccessfulScaleOutTime: "2023-08-06T00:03:44Z"
kind: List
metadata:
  resourceVersion: ""

RunnerDeployments

apiVersion: v1
items:
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: RunnerDeployment
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-large
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-21T20:22:36Z"
    generation: 1372
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-amd64-large
    namespace: actions-runner-system
    resourceVersion: "98605451"
    uid: 3a94f97e-4fe6-4d3a-b73d-77dd649fa813
  spec:
    effectiveTime: "2023-08-05T08:03:29Z"
    replicas: 0
    template:
      metadata:
        annotations:
          karpenter.sh/do-not-evict: "true"
      spec:
        dockerdWithinRunnerContainer: true
        env:
        - name: RUNNER_GRACEFUL_STOP_TIMEOUT
          value: "90"
        group: amd64-large
        image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:v2.307.1-ubuntu-20.04
        imagePullPolicy: IfNotPresent
        labels:
        - self-hosted
        - Linux
        - linux
        - Ubuntu
        - ubuntu
        - X64
        - x64
        - x86_64
        - amd64
        - AMD64
        - large
        nodeSelector:
          kubernetes.io/arch: amd64
          kubernetes.io/os: linux
        organization: my-organization
        resources:
          limits:
            cpu: 6000m
            memory: 7680Mi
          requests:
            cpu: 4000m
            memory: 7680Mi
        serviceAccountName: actions-runner
        terminationGracePeriodSeconds: 100
        volumeMounts:
        - mountPath: /home/runner/work/shared
          name: shared-volume
        volumes:
        - name: shared-volume
          persistentVolumeClaim:
            claimName: infra-runner-amd64-large
  status:
    availableReplicas: 0
    desiredReplicas: 0
    readyReplicas: 0
    replicas: 0
    updatedReplicas: 0
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: RunnerDeployment
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-medium
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-19T23:54:18Z"
    generation: 1064
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-amd64-medium
    namespace: actions-runner-system
    resourceVersion: "98607782"
    uid: ebef8bc4-0e74-437c-b139-9b1ac083815e
  spec:
    effectiveTime: "2023-08-06T01:00:36Z"
    replicas: 0
    template:
      metadata:
        annotations:
          karpenter.sh/do-not-evict: "true"
      spec:
        dockerdWithinRunnerContainer: true
        env:
        - name: RUNNER_GRACEFUL_STOP_TIMEOUT
          value: "90"
        image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:v2.307.1-ubuntu-20.04
        imagePullPolicy: IfNotPresent
        labels:
        - self-hosted
        - Linux
        - linux
        - Ubuntu
        - ubuntu
        - X64
        - x64
        - x86_64
        - amd64
        - AMD64
        - core-auto
        - medium
        nodeSelector:
          kubernetes.io/arch: amd64
          kubernetes.io/os: linux
        organization: my-organization
        resources:
          limits:
            cpu: 3000m
            memory: 3072Mi
          requests:
            cpu: 1500m
            memory: 1536Mi
        serviceAccountName: actions-runner
        terminationGracePeriodSeconds: 100
        volumeMounts:
        - mountPath: /home/runner/work/shared
          name: shared-volume
        volumes:
        - name: shared-volume
          persistentVolumeClaim:
            claimName: infra-runner-amd64-medium
  status:
    availableReplicas: 0
    desiredReplicas: 0
    readyReplicas: 0
    replicas: 0
    updatedReplicas: 0
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: RunnerDeployment
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-amd64-small
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-04-19T23:54:18Z"
    generation: 4566
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-amd64-small
    namespace: actions-runner-system
    resourceVersion: "98605914"
    uid: 07b9d677-11e5-488f-aa57-ddd41e3891a9
  spec:
    effectiveTime: "2023-08-06T00:21:14Z"
    replicas: 1
    template:
      spec:
        dockerdWithinRunnerContainer: true
        env:
        - name: RUNNER_GRACEFUL_STOP_TIMEOUT
          value: "90"
        image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:v2.307.1-ubuntu-20.04
        imagePullPolicy: IfNotPresent
        labels:
        - self-hosted
        - Linux
        - linux
        - Ubuntu
        - ubuntu
        - X64
        - x64
        - x86_64
        - amd64
        - AMD64
        - core-auto
        - common
        - default
        - small
        nodeSelector:
          kubernetes.io/arch: amd64
          kubernetes.io/os: linux
        organization: my-organization
        resources:
          limits:
            cpu: 1000m
            memory: 1024Mi
          requests:
            cpu: 500m
            memory: 256Mi
        serviceAccountName: actions-runner
        terminationGracePeriodSeconds: 100
        volumeMounts:
        - mountPath: /home/runner/work/shared
          name: shared-volume
        volumes:
        - name: shared-volume
          persistentVolumeClaim:
            claimName: infra-runner-amd64-small
  status:
    availableReplicas: 1
    desiredReplicas: 1
    readyReplicas: 1
    replicas: 1
    updatedReplicas: 1
- apiVersion: actions.summerwind.dev/v1alpha1
  kind: RunnerDeployment
  metadata:
    annotations:
      meta.helm.sh/release-name: infra-runner-arm64
      meta.helm.sh/release-namespace: actions-runner-system
    creationTimestamp: "2023-03-28T03:21:51Z"
    generation: 3836
    labels:
      app.kubernetes.io/managed-by: Helm
    name: infra-runner-arm64
    namespace: actions-runner-system
    resourceVersion: "98604888"
    uid: 3eb2f283-ddd1-4a03-b244-4ff65031c03f
  spec:
    effectiveTime: "2023-08-06T00:03:44Z"
    replicas: 0
    template:
      metadata:
        annotations:
          karpenter.sh/do-not-evict: "true"
      spec:
        dockerdWithinRunnerContainer: true
        env:
        - name: RUNNER_GRACEFUL_STOP_TIMEOUT
          value: "90"
        group: armEnabled
        image: ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:v2.307.1-ubuntu-20.04
        imagePullPolicy: IfNotPresent
        labels:
        - self-hosted
        - Linux
        - linux
        - Ubuntu
        - ubuntu
        - arm64
        - ARM64
        - aarch64
        - core-auto
        - small
        - medium
        - large
        - packages
        nodeSelector:
          kubernetes.io/arch: arm64
          kubernetes.io/os: linux
        organization: my-organization
        resources:
          limits:
            cpu: 2000m
            memory: 2048Mi
          requests:
            cpu: 250m
            memory: 512Mi
        serviceAccountName: actions-runner
        terminationGracePeriodSeconds: 100
        tolerations:
        - effect: NoSchedule
          key: kubernetes.io/arch
          operator: Equal
          value: arm64
        volumeMounts:
        - mountPath: /home/runner/work/shared
          name: shared-volume
        volumes:
        - name: shared-volume
          persistentVolumeClaim:
            claimName: infra-runner-arm64
  status:
    availableReplicas: 0
    desiredReplicas: 0
    readyReplicas: 0
    replicas: 0
    updatedReplicas: 0
kind: List
metadata:
  resourceVersion: ""

To Reproduce

Deploy multiple HRAs and RunnerDeployments with different label sets. Make sure that at least 2 deployments are not assigned group names and are thus in the Default group.
Run several jobs on self-hosted

Describe the bug

When a job's run-on spec matches multiple runner deployments, and HRAs are using webhook-based autoscaling. The HRA will unpredictably pick one of the default deployments to scale up or down, although the job may in fact be picked up by any deployment.

If there is only one RunnerDeployment in the default group, then I expect (have not tested it) that it will be that group that is consistently scaled up and down, but again, it will not necessarily be that group that actually gets assigned the job.

Describe the expected behavior

If a job matches multiple runner deployments, the HRA should, at a minimum, consistently pick the same deployment to scale up and scale down. This way, if no deployments have idle runners, autoscaling should work acceptably, as jobs would get picked up by the deployment being scaled up, and that deployment would be scaled down when jobs complete.

Ideally, when a job is completed, the HRA would match the job ID to a specific Pod and capacity reservation, delete the capacity reservation and scale down the deployment it is in.

Whole Controller Logs

Note that the job ran on infra-runner-amd64-small but the HRA scaled infra-runner-amd64-medium.

Whole Controller Logs

2023-08-06T00:58:03Z	INFO	-github-webhook-secret-token and GITHUB_WEBHOOK_SECRET_TOKEN are missing or empty. Create one following https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks and specify it via the flag or the envvar
2023-08-06T00:58:03Z	INFO	-watch-namespace is %q. Only HorizontalRunnerAutoscalers in %q are watched, cached, and considered as scale targets.	{"actions-runner-system": "actions-runner-system"}
2023-08-06T00:58:04Z	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2023-08-06T00:58:04Z	INFO	starting webhook server
2023-08-06T00:58:04Z	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2023-08-06T00:58:04Z	INFO	Starting EventSource	{"controller": "webhookbasedautoscaler", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "source": "kind source: *v1alpha1.HorizontalRunnerAutoscaler"}
2023-08-06T00:58:04Z	INFO	Starting Controller	{"controller": "webhookbasedautoscaler", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler"}
2023-08-06T00:58:04Z	INFO	Starting workers	{"controller": "webhookbasedautoscaler", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "worker count": 1}
2023-08-06T01:00:32Z	DEBUG	controllers.webhookbasedautoscaler	Found 0 HRAs by key	{"key": "my-organization/action-test"}
2023-08-06T01:00:32Z	DEBUG	controllers.webhookbasedautoscaler	Found some runner groups are managed by ARC	{"event": "workflow_job", "hookID": "386548876", "delivery": "a47ff4b0-33f4-11ee-9c98-cc4bfd7f2014", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}"}
2023-08-06T01:00:33Z	DEBUG	controllers.webhookbasedautoscaler	Searching in runner groups	{"event": "workflow_job", "hookID": "386548876", "delivery": "a47ff4b0-33f4-11ee-9c98-cc4bfd7f2014", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}"}
2023-08-06T01:00:33Z	DEBUG	controllers.webhookbasedautoscaler	groups	{"event": "workflow_job", "hookID": "386548876", "delivery": "a47ff4b0-33f4-11ee-9c98-cc4bfd7f2014", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}"}
2023-08-06T01:00:33Z	DEBUG	controllers.webhookbasedautoscaler	Found 2 HRAs by key	{"key": "my-organization"}
2023-08-06T01:00:33Z	DEBUG	controllers.webhookbasedautoscaler	job scale up target found	{"event": "workflow_job", "hookID": "386548876", "delivery": "a47ff4b0-33f4-11ee-9c98-cc4bfd7f2014", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "enterprise": "", "organization": "my-organization", "repository": "action-test", "key": "my-organization"}
2023-08-06T01:00:33Z	INFO	controllers.webhookbasedautoscaler	scaled infra-runner-amd64-medium by 1	{"event": "workflow_job", "hookID": "386548876", "delivery": "a47ff4b0-33f4-11ee-9c98-cc4bfd7f2014", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127}
2023-08-06T01:00:33Z	INFO	controllers.webhookbasedautoscaler	Starting batch worker
2023-08-06T01:00:36Z	DEBUG	controllers.webhookbasedautoscaler	Patching hra infra-runner-amd64-medium for capacityReservations update	{"before": 0, "expired": -1, "added": 1, "completed": 0, "after": 1}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	Found 0 HRAs by key	{"key": "my-organization/action-test"}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	Found some runner groups are managed by ARC	{"event": "workflow_job", "hookID": "386548876", "delivery": "a942fb00-33f4-11ee-9d84-8c604b1056fa", "workflowJob.status": "completed", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}"}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	Searching in runner groups	{"event": "workflow_job", "hookID": "386548876", "delivery": "a942fb00-33f4-11ee-9d84-8c604b1056fa", "workflowJob.status": "completed", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}"}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	groups	{"event": "workflow_job", "hookID": "386548876", "delivery": "a942fb00-33f4-11ee-9d84-8c604b1056fa", "workflowJob.status": "completed", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "groups": "RunnerGroup{Scope:Organization, Kind:Default, Name:}, RunnerGroup{Scope:Organization, Kind:Custom, Name:armEnabled}, RunnerGroup{Scope:Organization, Kind:Custom, Name:amd64-large}"}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	Found 2 HRAs by key	{"key": "my-organization"}
2023-08-06T01:00:40Z	DEBUG	controllers.webhookbasedautoscaler	job scale up target found	{"event": "workflow_job", "hookID": "386548876", "delivery": "a942fb00-33f4-11ee-9d84-8c604b1056fa", "workflowJob.status": "completed", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127, "enterprise": "", "organization": "my-organization", "repository": "action-test", "key": "my-organization"}
2023-08-06T01:00:40Z	INFO	controllers.webhookbasedautoscaler	scaled infra-runner-amd64-medium by -1	{"event": "workflow_job", "hookID": "386548876", "delivery": "a942fb00-33f4-11ee-9d84-8c604b1056fa", "workflowJob.status": "completed", "workflowJob.labels": ["self-hosted"], "repository.name": "action-test", "repository.owner.login": "my-organization", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "workflowJob.runID": 5773730087, "workflowJob.ID": 15649994127}
2023-08-06T01:00:42Z	DEBUG	controllers.webhookbasedautoscaler	Patching hra infra-runner-amd64-medium for capacityReservations update	{"before": 1, "expired": 1, "added": 0, "completed": -1, "after": 0}

Whole Runner Pod Logs

No relevant pod logs

Aug 06 '23 01:08 Nuru

@nikola-jokic @mumoshu Note that this is an issue for the Summerwind controller. It still applies to summerwind/actions-runner-controller:v0.27.6.

Jun 05 '24 21:06 Nuru