pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

allow for retry on typically transient k8s errors in both core controller and resolver for remote resolution

Open gabemontero opened this issue 1 year ago • 21 comments

Changes

collaborating with @khrm here is an augmented version of https://github.com/tektoncd/pipeline/pull/7893

Fixes https://github.com/tektoncd/pipeline/issues/7909

During both sides of remote resolution (core controller and resolver) typically transient kubernetes errors were being treated as permanent knative errors and no attempts at trying to reconcile again were made, leading to failures which could be avoided.

These changes addresses that.

An example log snippet from the core controller

Pipeline rh-acs-tenant/operator-on-pull-request-bwqxj can't be Run; it contains Tasks that don't exist: Couldn't retrieve Task "": retryable error validating referenced object source-build: Internal error occurred: failed calling webhook "validation.webhook.pipeline.tekton.dev": failed to call webhook: Post "https://tekton-pipelines-webhook.openshift-pipelines.svc:443/resource-validation?timeout=10s": context deadline exceeded

Accompanying log snippet from the resolver

{"level":"error","ts":"2024-04-17T10:50:05.866Z","logger":"controller","caller":"controller/controller.go:566","msg":"Reconcile error","commit":"f0a1d64","knative.dev/traceid":"b893d6a6-2eb7-4a53-b502-1348803a7085","knative.dev/key":"rh-acs-tenant/bundles-780a1fe396cb0f8c702b34e9289fc770","duration":"10.3628985s","error":"error updating resource request \"rh-acs-tenant/bundles-780a1fe396cb0f8c702b34e9289fc770\" with data: Internal error occurred: failed calling webhook \"webhook.pipeline.tekton.dev\": failed to call webhook: Post \"https://tekton-pipelines-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s\": context deadline exceeded","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:491"}

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [ n/a] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [/ ] Has Tests included if any functionality added or changed
  • [ /] pre-commit Passed
  • [/ ] Follows the commit message standard
  • [ ] Meets the Tekton contributor standards (including functionality, content, code)
  • [/] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ /] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [n/a ] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

This fix address the lack of retry on transient kubernetes errors during remote resolution for tasks, etc. 

gabemontero avatar Apr 18 '24 17:04 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/taskref.go 94.0% 90.2% -3.9
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 17:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/taskref.go 94.0% 90.2% -3.9
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 17:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 18:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 19:04 tekton-robot

just pushed an extra-credit commit that properly generates TaskNotFoundError when the task ref name is not set in the Name field, but rather comes from the name Param

like:

      taskRef:
        kind: Task
        params:
        - name: name
          value: summary
        - name: bundle
          value: myregistry/myrepo/myimage:mytag_or_sha
        - name: kind
          value: task
        resolver: bundles

I can break it off into a separate PR if desired

gabemontero avatar Apr 18 '24 19:04 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.8% 0.1
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 19:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.8% 0.1
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 73.9% 0.8

tekton-robot avatar Apr 18 '24 19:04 tekton-robot

LGTM

savitaashture avatar Apr 19 '24 06:04 savitaashture

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 19 '24 20:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 19 '24 20:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 22 '24 12:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 22 '24 12:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 22 '24 13:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 22 '24 13:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 24 '24 19:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.0% 94.8% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 45.7% -30.5

tekton-robot avatar Apr 24 '24 19:04 tekton-robot

/assign

chitrangpatel avatar Apr 26 '24 23:04 chitrangpatel

/hold cancel

gabemontero avatar May 03 '24 17:05 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.2% 95.0% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 82.9% 6.7

tekton-robot avatar May 06 '24 17:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.2% 95.0% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 76.2% 82.9% 6.7

tekton-robot avatar May 06 '24 17:05 tekton-robot

ok @chitrangpatel @afrittoli I have rebased on top of @chitrangpatel 's recently merged refactor

gabemontero avatar May 14 '24 21:05 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.3% 95.0% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 55.3% 65.4% 10.1

tekton-robot avatar May 14 '24 21:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 94.3% 95.0% 0.8
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/resource/name.go 55.3% 65.4% 10.1

tekton-robot avatar May 14 '24 21:05 tekton-robot

Can you also make the corresponding changes from pkg/resolution/resolver/framework/reconciler.go and its test

into pkg/remoteresolution/resolver/framework/reconciler.go and its test file?

All the other files are LGTM!

Thats the newer framework. The current one will be deprecated at somepoint.

Thanks @chitrangpatel - as I mentioned on slack, I think we should only do the changes in the new framework, since that's what is actually deployed and used now, and I would not want to have to maintain both. The old framework is still around to provide some room for potential users to move to the new one, but I would not backport changes to it.

@chitrangpatel should we mark the old resolvers as deprecated already?

afrittoli avatar May 15 '24 08:05 afrittoli

Can you also make the corresponding changes from pkg/resolution/resolver/framework/reconciler.go and its test into pkg/remoteresolution/resolver/framework/reconciler.go and its test file? All the other files are LGTM! Thats the newer framework. The current one will be deprecated at somepoint.

Thanks @chitrangpatel - as I mentioned on slack, I think we should only do the changes in the new framework, since that's what is actually deployed and used now, and I would not want to have to maintain both. The old framework is still around to provide some room for potential users to move to the new one, but I would not backport changes to it.

@chitrangpatel should we mark the old resolvers as deprecated already?

Yes, I'm happy to. I submitted a PR for that already.

chitrangpatel avatar May 15 '24 10:05 chitrangpatel

ok @chitrangpatel @afrittoli I moved those changes from the deprecated resolution subpackage over to remoteresolution

there was a minor goimport cleanup I left in the old deprecated package reconciler_test.go file

I also squashed the commits and updated the git commit message accordingling

PTAL / thanks

gabemontero avatar May 15 '24 14:05 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 93.1% 93.8% 0.7
pkg/remoteresolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resource/name.go 55.3% 65.4% 10.1

tekton-robot avatar May 15 '24 15:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 93.1% 93.8% 0.7
pkg/remoteresolution/resolver/framework/reconciler.go 73.1% 71.8% -1.3
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resource/name.go 55.3% 65.4% 10.1

tekton-robot avatar May 15 '24 15:05 tekton-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chitrangpatel, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [chitrangpatel,vdemeester]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar May 15 '24 17:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.7% 0.0
pkg/reconciler/taskrun/resources/taskref.go 93.3% 94.0% 0.7
pkg/remoteresolution/resolver/framework/reconciler.go 76.1% 74.6% -1.5
pkg/resolution/common/errors.go 17.6% 13.0% -4.6
pkg/resolution/resource/name.go 56.1% 65.5% 9.4

tekton-robot avatar May 16 '24 14:05 tekton-robot