pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

fix fan out matrix task failed due to result ref

Open chengjoey opened this issue 1 year ago • 4 comments

Changes

fix fan out matrix task failed due to result ref

fixes #8324

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [ ] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [ ] Has Tests included if any functionality added or changed
  • [ ] pre-commit Passed
  • [x] Follows the commit message standard
  • [x] Meets the Tekton contributor standards (including functionality, content, code)
  • [x] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ ] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [ ] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

chengjoey avatar Oct 14 '24 09:10 chengjoey

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign jerop after the PR has been reviewed. You can assign the PR to them by writing /assign @jerop in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar Oct 14 '24 09:10 tekton-robot

/kind bug

chengjoey avatar Oct 14 '24 09:10 chengjoey

/hold

this can successfully fan out the matrix task, but need to determine whether we should do this.

and need to add tests

chengjoey avatar Oct 14 '24 09:10 chengjoey

@chengjoey: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-tekton-pipeline-unit-tests c48ebd66dc106b4ce4399bdcd69e8fd9b4b75a63 link true /test pull-tekton-pipeline-unit-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

tekton-robot avatar Oct 14 '24 10:10 tekton-robot

@chengjoey still working on this ?

vdemeester avatar Nov 06 '24 08:11 vdemeester

@chengjoey still working on this ?

@vdemeester ,Yes, since I'm changing jobs recently, I will continue this fix when I'm stable.

chengjoey avatar Nov 06 '24 15:11 chengjoey

+1 on this PR I have the same behaviour as #8324

fabricebrito avatar Jan 08 '25 16:01 fabricebrito

+1 on this PR I have the same behaviour as #8324

Thanks @fabricebrito - I'm afraid @chengjoey may not be working on this PR anymore. Would you be interested in picking this up and getting it ready to merge? We would need to add tests to check that the behaviour has been corrected and avoid future regressions.

@vdemeester do you know what concern @chengjoey had on this if any, see the comment

but need to determine whether we should do this.

afrittoli avatar Jan 09 '25 13:01 afrittoli

@afrittoli my Go skills are very limited so I don't think I can help here, I'm sorry. It's just that this PR would allow using the fan-out results from a matrix Task and this is key in our Tekton pipelines. I'm really looking forward for a fix :-)

fabricebrito avatar Jan 09 '25 13:01 fabricebrito

Hi @afrittoli , I recently went back to review the PR I mentioned before, but have not yet invested in this current PR (the logic is complex). If someone is willing to continue, I will provide assistance. If not, I will increase the priority of this PR.

chengjoey avatar Jan 09 '25 14:01 chengjoey

Hi @afrittoli , I recently went back to review the PR I mentioned before, but have not yet invested in this current PR (the logic is complex). If someone is willing to continue, I will provide assistance. If not, I will increase the priority of this PR.

I will follow up on this issue. I should have time to deal with it within two weeks.

l-qing avatar Jan 14 '25 04:01 l-qing

1. Reproducible steps

$ kubectl version

Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.28.8

$ tkn version

Client version: 0.39.0
Pipeline version: v0.66.0
cat <<'EOF' | kubectl replace -f -
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: array-emitter
spec:
  results:
  - name: array
    type: array
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      script: |
        echo -n "[\"linux\",\"max\",\"windows\"]" > $(results.array.path)

---
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: platform-browsers
spec:
  params:
    - name: platform
  results:
  - name: str
    type: string
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      script: |
        echo -n "$(params.platform)" | tee $(results.str.path)

---
apiVersion: tekton.dev/v1
kind: Task
metadata:
  name: printer
spec:
  params:
    - name: platform
      default: "default-platform"
    - name: platforms
      default: []
  steps:
    - name: echo
      image: mirror.gcr.io/alpine
      args:
        - "$(params.platforms)"
      script: |
        if [ -z "$(params.platform)" ]; then
          echo "platform: $(params.platform)"
        fi
        if [ $# -gt 0 ]; then
          echo "platforms: $@"
        fi

---
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  name: matrixed-pr
spec:
  taskRunTemplate:
    serviceAccountName: "default"
  pipelineSpec:
    tasks:

    - name: array-emitter
      taskRef:
        name: array-emitter

    - name: platforms
      params:
        - name: test
          value: test
      matrix:
        params:
          - name: platform
            value: $(tasks.array-emitter.results.array[*])
      taskRef:
        name: platform-browsers

    - name: printer-matrix
      taskRef:
        name: printer
      matrix:
        params:
          - name: platform
            value: $(tasks.platforms.results.str[*])

    - name: printer-all-platforms
      taskRef:
        name: printer
      params:
        - name: platforms
          value: $(tasks.platforms.results.str[*])
EOF

2. Error message

invalid result reference in pipeline task "printer-matrix": unable to validate result referencing pipeline task "platforms": task spec not found

apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  creationTimestamp: "2025-01-14T07:40:10Z"
  generation: 1
  name: matrixed-pr
  namespace: default
  resourceVersion: "26861"
  uid: feba93ac-759d-4817-8512-700eb8eef059
spec:
  pipelineSpec:
    tasks:
    - name: array-emitter
      taskRef:
        kind: Task
        name: array-emitter
    - matrix:
        params:
        - name: platform
          value: $(tasks.array-emitter.results.array[*])
      name: platforms
      params:
      - name: test
        value: test
      taskRef:
        kind: Task
        name: platform-browsers
    - matrix:
        params:
        - name: platform
          value: $(tasks.platforms.results.str[*])
      name: printer-matrix
      taskRef:
        kind: Task
        name: printer
    - name: printer-all-platforms
      params:
      - name: platforms
        value: $(tasks.platforms.results.str[*])
      taskRef:
        kind: Task
        name: printer
  taskRunTemplate:
    serviceAccountName: default
  timeouts:
    pipeline: 1h0m0s
status:
  completionTime: "2025-01-14T07:40:12Z"
  conditions:
  - lastTransitionTime: "2025-01-14T07:40:12Z"
    message: 'invalid result reference in pipeline task "printer-matrix": unable to
      validate result referencing pipeline task "platforms": task spec not found'
    reason: InvalidTaskResultReference
    status: "False"
    type: Succeeded
  pipelineSpec:
    tasks:
    - name: array-emitter
      taskRef:
        kind: Task
        name: array-emitter
    - matrix:
        params:
        - name: platform
          value: $(tasks.array-emitter.results.array[*])
      name: platforms
      params:
      - name: test
        value: test
      taskRef:
        kind: Task
        name: platform-browsers
    - matrix:
        params:
        - name: platform
          value: $(tasks.platforms.results.str[*])
      name: printer-matrix
      taskRef:
        kind: Task
        name: printer
    - name: printer-all-platforms
      params:
      - name: platforms
        value: $(tasks.platforms.results.str[*])
      taskRef:
        kind: Task
        name: printer
  provenance:
    featureFlags:
      AwaitSidecarReadiness: true
      Coschedule: workspaces
      DisableAffinityAssistant: false
      DisableCredsInit: false
      DisableInlineSpec: ""
      EnableAPIFields: beta
      EnableArtifacts: false
      EnableCELInWhenExpression: false
      EnableConciseResolverSyntax: false
      EnableKeepPodOnCancel: false
      EnableKubernetesSidecar: false
      EnableParamEnum: false
      EnableProvenanceInStatus: true
      EnableStepActions: false
      EnforceNonfalsifiability: none
      MaxResultSize: 4096
      RequireGitSSHSecretKnownHosts: false
      ResultExtractionMethod: termination-message
      RunningInEnvWithInjectedSidecars: true
      SendCloudEventsForRuns: false
      SetSecurityContext: false
      VerificationNoMatchPolicy: ignore
  startTime: "2025-01-14T07:40:12Z"

3. Analysis

a. validateResultRef: unable to validate result referencing pipeline task

if ptMap[ref.PipelineTask].ResolvedTask == nil || ptMap[ref.PipelineTask].ResolvedTask.TaskSpec == nil {

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/validate_dependencies.go#L75-L77

b. ValidatePipelineTaskResults: invalid result reference in pipeline task

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/validate_dependencies.go#L31-L36

c. PipelineRun-Reconcile: call ValidatePipelineTaskResults

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/pipelinerun.go#L697-L702

d. pipelineRunFacts.State: come from resolvePipelineState

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/pipelinerun.go#L612-L627

e. resolvePipelineState: resolvedTask - ResolvePipelineTask

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/pipelinerun.go#L368-L377

f. ResolvePipelineTask: call CountCombinations

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L602-L630

h1. CountCombinations: Calculate the number of TaskRuns

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/apis/pipeline/v1/matrix_types.go#L220-L242

h2. Error: The calculated count is 0.

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/apis/pipeline/v1/matrix_types.go#L233-L239

Because the param.Value.StringVale is $(tasks.platforms.results.str[*]) and param.Value.ArrayVal is empty.

j1. ResolvePipelineTask: call GetNamesOfTaskRuns

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L628-L630

j2. GetNamesOfTaskRuns: call getNewRunNames the numberOfRuns is 0, the result taskRunNames is empty

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L749-L771

k. ResolvePipelineTask: setTaskRunsAndResolvedTask has not been called.

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L628-L632

l. setTaskRunsAndResolvedTask: ResolvedTask has not been set.

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L641-L662

m. So it led to the error seen at the top.

4. Action

I will add the unit tests and corresponding integration tests later.

l-qing avatar Jan 14 '25 08:01 l-qing

@chengjoey I am unable to push commits to that branch of your code repository.

Could you either grant me the appropriate permissions or merge this modification into your branch?

https://github.com/tektoncd/pipeline/compare/main...l-qing:pipeline:fix/matrix-result-ref

l-qing avatar Jan 14 '25 12:01 l-qing

It seems that this usage might bypass the limitation of DefaultMaxMatrixCombinationsCount.

https://github.com/tektoncd/pipeline/blob/1dd488eda738a124e9dfe8874dbb192f8bc30839/pkg/apis/pipeline/v1/matrix_types.go#L294-L301

l-qing avatar Jan 14 '25 12:01 l-qing