actions-runner-controller
actions-runner-controller copied to clipboard
HRA ignores webhook repository scale up events (but acks scaleDown)
Describe the bug
It appears that under certain conditions, HRAs configured against a RunnerDeployment repository
is not recognized on scaleUp
events (but is recognized for scaleDown
events)
Checks
- [x] My actions-runner-controller version (v0.x.y) does support the feature
- [ ] I'm using an unreleased version of the controller I built from HEAD of the default branch
To Reproduce Steps to reproduce the behavior:
- Create a
repo
-only PAT on an account that has member access to an organization repo where they are given theadmin
rights on the repository. (note: the user is not given admin access to the org) - Configure webhook on the repo, setup actions controller, install PAT
- Create RunnerDeployment and HPA pointing at the repo
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: bar-runner
spec:
template:
spec:
repository: foo/bar
## using eks / oidc
# image: image:tag
# serviceAccountName: svc-account-name
# securityContext:
# fsGroup: 1000
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: bar-hra
spec:
scaleTargetRef:
name: bar-runner
scaleUpTriggers:
- githubEvent: {}
minReplicas: 0
maxReplicas: 10
- Trigger a workflow, webhook controller should identify the HRA (but fail to take action):
Found 1 HRAs by key {"key": "foo/bar"}
Found 0 HRAs by key {"key": "foo"}
no repository runner or organizational runner found {"event": "workflow_job", "hookID": "332463031", "delivery": "37a4e5c0-591d-11ec-8c42-e1ea428feb2f", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted", "Linux", "x64"], "repository.name": "bar", "repository.owner.login": "foo", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued", "repository": "foo/bar", "organization": "foo"}
Scale target not found. If this is unexpected, ensure that there is exactly one repository-wide or organizational runner deployment that matches this webhook event {"event": "workflow_job", "hookID": "332463031", "delivery": "37a4e5c0-591d-11ec-8c42-e1ea428feb2f", "workflowJob.status": "queued", "workflowJob.labels": ["self-hosted", "Linux", "x64"], "repository.name": "bar", "repository.owner.login": "foo", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "queued"}
- Cancel the workflow, controller should both find the HRA and do something:
Found 1 HRAs by key {"key": "foo/bar"}
job scale up target is repository-wide runners {"event": "workflow_job", "hookID": "332463031", "delivery": "b68560f0-5921-11ec-8d7a-9f951ff061e6", "workflowJob.status": "completed", "workflowJob.labels": [], "repository.name": "bar", "repository.owner.login": "foo", "repository.owner.type": "Organization", "enterprise.slug": "", "action": "completed", "repository": "bar"}
Patching hra for capacityReservations update {"before": null, "after": null}
scaled bar-hra by -1
Expected behavior I expect the HRA to scale up the RD.
Screenshots n/a
Environment (please complete the following information):
- Helm Release - v0.15.1
- eks - v1.21
Additional context This is weird.
this appears to be related to https://github.com/actions-runner-controller/actions-runner-controller/issues/951
@kcrawley-supernatural Hey! Thanks for reporting. As you've pointed out yes this seems related to #951.
Would you mind sharing your workflow yaml? I'm especially interested in what you wrote under the on
field of the job that resulted in workflow_job
events you saw.
I'm mainly concerned about two things now:
-
"workflowJob.labels": ["self-hosted", "Linux", "x64"]
on the action=queued workflow_job event- actions-runner-controller assumes it to only contain["self-hosted"]
if you omittedrunner.spec.labels
(orrunnerdeployment.spec.template.labels
) -
"workflowJob.labels": []
on theaction=canceled
workflow_job event- the same as above
@kcrawley-supernatural Would you mind sharing your workflow definition YAML? I'm mainly interested in what you have under runs-on
.
@mumoshu sorry for ignoring this for so long.
it appears the workflow definitions runs-on
must match what your actions-runner is configured for. eg:
workflow.yaml
jobs:
task:
runs-on: [self-hosted, Linux, x64]
runnerdeployment.yaml
kind: RunnerDeployment
metadata:
name: task-runner
spec:
template:
spec:
labels:
- self-hosted
- Linux
- x64
@mumoshu sorry for ignoring this for so long.
it appears the workflow definitions
runs-on
must match what your actions-runner is configured for. eg:workflow.yaml
jobs: task: runs-on: [self-hosted, Linux, x64]
runnerdeployment.yaml
kind: RunnerDeployment metadata: name: task-runner spec: template: spec: labels: - self-hosted - Linux - x64
Thank you for that, it resolved my issue.
I also forgot to add the webhook in the organization level https://github.com/organizations/MY_ORG/settings/hooks
, for some reason I thought that the webhook should be set in the "Installed App" that was created. After setting the webhook in the organization level, everything worked like a charm.
This might be clear to others, but the key here is that if you use the automatically generated labels [self-hosted, Linux, X64]
as part of the runs-on
array, you must also include them as part of the RunnerDeployment spec or the webhook server won't be able to find the HorizontalRunnerAutoscaler to scale.