actions-runner-controller horizontalrunnerautoscaler Detected job with no labels, which is not supported by ARC. Skipping anyway

horizontalrunnerautoscaler Detected job with no labels, which is not supported by ARC. Skipping anyway

Open mattpopa opened this issue 2 years ago • 3 comments

Checks

[X] I've already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
[X] I'm not using a custom entrypoint in my runner image

Controller Version

v0.27.4

Helm Chart Version

0.23.3

CertManager Version

No response

Deployment Method

Helm

cert-manager installation

yes, this is the cert manager has been installed using

helm upgrade --install cert-manager jetstack/cert-manager \                                                                                                         
--namespace cert-manager \
--create-namespace \
--version v1.11.0 \
--set installCRDs=true --wait

Checks

[X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
[X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
[X] My actions-runner-controller version (v0.x.y) does support the feature
[X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
[X] I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: self-hosted-large
  namespace: actions-runner-system
spec:
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      serviceAccountName: github-actions-sa
      securityContext:
        # For Ubuntu 20.04 runner
        fsGroup: 1000
      organization: my-org
      image: summerwind/actions-runner-dind:latest
      imagePullPolicy: IfNotPresent
      ephemeral: true
      dockerEnabled: false
      dockerdWithinRunnerContainer: true
      containers:
      - name: runner
        resources:
          requests:
            memory: "10Gi"
            cpu: "3000m"
          limits:
            memory: "10Gi"
            cpu: "3000m"
      labels:
        - large
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  namespace: actions-runner-system
  name: self-hosted-large
spec:
  scaleDownDelaySecondsAfterScaleOut: 10
  scaleTargetRef:
    kind: RunnerDeployment
    name: self-hosted-large
  minReplicas: 0
  maxReplicas: 6
  metrics:
    - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
      repositoryNames:
        - frontend

To Reproduce

this happens randomly, and the jobs have labels using this format:

runs-on: [self-hosted, large]

https://github.com/actions/actions-runner-controller/blob/032443fcfd4cf7b6e8bb09ed9dca639bcba9f8a4/controllers/actions.summerwind.net/autoscaling.go#L153



### Describe the bug

Randomly, the `horizontalrunnerautoscaler` doesn't update the desired replicas and the job waits indefinitely in github:

Requested labels: self-hosted, large Job defined at: my-org/frontend/.github/workflows/zcommon_web_e2e_tests.yml@refs/heads/master Reusable workflow chain: my-org/frontend/.github/workflows/web_scheduled_e2e.yml@refs/heads/master (a9790cfa59ca77ead2f8ec4987a9cac8e98cfcce) -> my-org/frontend/.github/workflows/zcommon_web_e2e_tests.yml@refs/heads/master (a9790cfa59ca77ead2f8ec4987a9cac8e98cfcce) Waiting for a runner to pick up this job...

and the job uses the following label format

runs-on: [self-hosted, large]


should there be any dif between setting labels within quotes for the `horizontalrunnerautoscaler`?

runs-on: [self-hosted, large]

vs

runs-on: ["self-hosted", "large"]

?

any suggestion on how to further debug this?



### Describe the expected behavior

we shouldn't see this error in the ARC logs


### Whole Controller Logs

```shell
2023-05-22T10:02:22Z	INFO	horizontalrunnerautoscaler	Detected job with no labels, which is not supported by ARC. Skipping anyway.	{"labels": [], "run_id": 5044287443, "job_id": 13654547143}



### Whole Runner Pod Logs

```shell
there are no runner logs available

Additional Context

there is no runner in pending state, there are avialble resources on the node(s).

May 22 '23 13:05 mattpopa

I have the same issue and I don't understand how to use TotalNumberOfQueuedAndInProgressWorkflowRuns

Nov 16 '23 14:11 rtsisyk

actions-runner-controller actions-runner-controller copied to clipboard

horizontalrunnerautoscaler Detected job with no labels, which is not supported by ARC. Skipping anyway

Checks

Controller Version

Helm Chart Version

CertManager Version

Deployment Method

cert-manager installation

Checks

Resource Definitions

To Reproduce

Additional Context

actions-runner-controller
actions-runner-controller copied to clipboard