pipelines-as-code icon indicating copy to clipboard operation
pipelines-as-code copied to clipboard

Pipelines never get triggered

Open jwitrick opened this issue 7 months ago • 5 comments

With recent updates to Tekton (pipelines: v1.0) some of my pipelines (created through PAC) will fail to start. The pipelines will get created but the status will be: PipelineRunPending and the stuck pipelines never progress.

The kubernetes system has this error:

admission webhook "validation.webhook.pipeline.tekton.dev" denied the request: validation failed: invalid value: Once the PipelineRun has started, only status updates are allowed: spec

I am not sure why some pipelines have this issue and others dont, but as of now the only way to proceed is for me to delete the tekton validation.webhook.pipeline.tekton.dev and webhook.pipeline.tekton.dev

Here is an example of my pipelinerun (inside the .tekton/ directory):

---
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  name: pr
  labels:
    pipeline-run-name: my-pr
  annotations:
    karpenter.sh/do-not-disrupt: "true"
    # The event we are targeting as seen from the webhook payload
    # this can be an array too, i.e: [pull_request, push]
    pipelinesascode.tekton.dev/on-event: "[pull_request]"

    # The branch or tag we are targeting (ie: main, refs/tags/*)
    pipelinesascode.tekton.dev/on-target-branch: "[main, release/*]"

    # Fetch the git-clone task from hub, we are able to reference later on it
    # with taskRef and it will automatically be embedded into our pipeline.

    pipelinesascode.tekton.dev/task: ".tekton/tasks/custom-vars.yaml" # TODO: Remove once CI is merged
    pipelinesascode.tekton.dev/task-1: ".tekton/tasks/git-clone.yaml"
    pipelinesascode.tekton.dev/task-2: ".tekton/tasks/read-config.yaml"
    pipelinesascode.tekton.dev/task-3: ".tekton/tasks/helm-publish.yaml"
    pipelinesascode.tekton.dev/task-4: "kubernetes-actions"
    pipelinesascode.tekton.dev/task-5: ".tekton/tasks/build-web-assets.yaml"
    pipelinesascode.tekton.dev/task-6: ".tekton/tasks/start-world.yaml"
    pipelinesascode.tekton.dev/task-7: ".tekton/tasks/gh-comment-image-scanner.yaml"
    pipelinesascode.tekton.dev/task-8: ".tekton/tasks/sonarqube-scanner.yaml"
    pipelinesascode.tekton.dev/task-9: ".tekton/tasks/unit-tests.yaml"
    pipelinesascode.tekton.dev/task-10: "github-set-status"
    pipelinesascode.tekton.dev/task-11: ".tekton/tasks/main-deploy.yaml"
    # How many runs we want to keep.
    pipelinesascode.tekton.dev/max-keep-runs: "5"
spec:
  params:
    # The variable with brackets are special to Pipelines as Code
    # They will automatically be expanded with the events from Github.
    # https://pipelinesascode.com/docs/guide/authoringprs/#default-parameters
    - name: url
      value: "{{ repo_url }}"
    - name: revision
      value: "{{ revision }}"
    - name: repo_name
      value: "{{ repo_name }}"
    - name: branch_name
      value: "{{ source_branch }}"
    - name: pull_request_number
      value: "{{ pull_request_number }}"
    - name: pull_request_base_ref
      value: "{{ body.pull_request.base.ref }}"
    - name: pull_request_url
      value: "{{ body.pull_request.html_url }}"
    - name: assets_env
      value: ""
    - name: clone_url
      value: "{{ repo_url }}"
    - name: pipeline_label_selector
      value: "pipeline-run-name=my-pr,pipelinesascode.tekton.dev/state=started,pipelinesascode.tekton.dev/state!=completed,pipelinesascode.tekton.dev/state!=failed,pipelinesascode.tekton.dev/state!=cancelled,pipelinesascode.tekton.dev/sha!={{ revision }},pipelinesascode.tekton.dev/pull-request=={{ pull_request_number }}"
  # This works.. and only affects build-images
  taskRunTemplate:
    serviceAccountName: cicd
    podTemplate:
      nodeSelector:
        nodes.io/node-role: iops
      tolerations:
      - effect: NoExecute
        key: role
        operator: Equal
        value: iops
      securityContext:
        fsGroup: 65532
        fsGroupChangePolicy: OnRootMismatch
  taskRunSpecs:
    - pipelineTaskName: fetch-repository
      computeResources:
        requests:
          cpu: 2
          memory: 256Mi
    - pipelineTaskName: build-images
      metadata:
        annotations:
          karpenter.sh/do-not-disrupt: "true"
      stepSpecs:
        - name: get-skaffold-output-images
          computeResources:
            requests:
              cpu: 1
              memory: 256Mi
        - name: build-image
          computeResources:
            requests:
              cpu: 2
              memory: 13Gi
        - name: write-digest
          computeResources:
            requests:
              cpu: 1
              memory: 256Mi
        - name: digest-to-results
          computeResources:
            requests:
              cpu: 1
              memory: 256Mi
    - pipelineTaskName: rubocop
      metadata:
        annotations:
          karpenter.sh/do-not-disrupt: "true"
      computeResources:
        requests:
          cpu: 2
          memory: 2Gi
    - pipelineTaskName: unit-test
      metadata:
        annotations:
          karpenter.sh/do-not-disrupt: "true"
      stepSpecs:
        - name: run-unit-test
          computeResources:
            requests:
              cpu: 2
              memory: 2Gi
        - name: check-test-results
          computeResources:
            requests:
              cpu: 1
              memory: 256Mi
    - pipelineTaskName: sonarqube-scanner
      metadata:
        annotations:
          karpenter.sh/do-not-disrupt: "true"
      computeResources:
        requests:
          cpu: 2
          memory: 2Gi
  pipelineRef:
    name: ci
  timeouts:
    pipeline: "1h30m00s"
  workspaces:
  - name: sonar_cache
    persistentVolumeClaim:
      claimName: efs-tekton-direct-cicd
    subPath: "CI/my/sonar_cache"
  - name: source
    volumeClaimTemplate:
      spec:
        storageClassName: efs-tekton-sc-dynamic
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: 2Gi
  - name: dockerconfig
    secret:
      secretName: docker-credentials
  - name: scratch
    volumeClaimTemplate:
      spec:
        storageClassName: efs-tekton-sc-dynamic
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: 2Gi
  # This workspace will inject secret to help the git-clone task to be able to
  # checkout the private repositories
  - name: basic-auth
    secret:
      secretName: "{{ git_auth_secret }}"

At this point im not sure what exactly is trying to be modified, but it only happens with this pipelinerun, and only through pipelinesascode.

jwitrick avatar May 19 '25 19:05 jwitrick

@vdemeester any idea?

chmouel avatar May 21 '25 08:05 chmouel

🤔 we should try to reproduce this 🤔 @chmouel given that there is no spec.status in the above Pipeline definition, I guess, pac is setting the status to PipelineRunPending right ?

I wonder if there could be a race (or at least a bug) where the webhook (or the controller) thinks the Pipeline has start when it didn't…

vdemeester avatar May 21 '25 09:05 vdemeester

is concurrency used ?

chmouel avatar May 21 '25 09:05 chmouel

@jwitrick can you share the full yaml of the object ? I am interested into seeing the status, because if it's a race, it might be "seen" as started from the Pipeline controller perspective even though it has the pending status.

Also, what pipelines-as-code are you running ?

vdemeester avatar May 21 '25 10:05 vdemeester

(I tried to replicate simply by setting the PipelineRunPending when creating the object, and it just works. I guess, if it's using pac concurrency, and it's the watcher that sets the PipelineRunPending, something might go wrong.

vdemeester avatar May 21 '25 10:05 vdemeester