pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Fix for ResourceQuotaConflictError

Open yachna opened this issue 2 years ago • 17 comments

Fix for the bug to recreate pod for podRecreationLimit number of times if it runs into ResourceQuotaConflictError Error i.e https://github.com/kubernetes/kubernetes/issues/67761

Changes

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [x] Has Docs included if any changes are user facing
  • [x] Has Tests included if any functionality added or changed
  • [x] Follows the commit message standard
  • [x] Meets the Tekton contributor standards (including functionality, content, code)
  • [x] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [x] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
  • [x] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Fix for the bug to recreate pod for podRecreationLimit number of times if it runs into ResourceQuotaConflictError Error i.e https://github.com/kubernetes/kubernetes/issues/67761

yachna avatar Aug 02 '22 13:08 yachna

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: yachna / name: Yachna (f37c173df7a7c6d386684767b2c72b7c493f6a7f)

Hi @yachna. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Aug 02 '22 13:08 tekton-robot

/kind bug

SaschaSchwarze0 avatar Aug 02 '22 14:08 SaschaSchwarze0

Shouldn't this already already be handled in https://github.com/tektoncd/pipeline/blob/94055d92c120a6010f3d61a821e45dea4f893a74/pkg/reconciler/taskrun/taskrun.go#L577?

Also see https://github.com/tektoncd/pipeline/issues/734

dibyom avatar Aug 03 '22 21:08 dibyom

Shouldn't this already already be handled in

https://github.com/tektoncd/pipeline/blob/94055d92c120a6010f3d61a821e45dea4f893a74/pkg/reconciler/taskrun/taskrun.go#L577 ?

Also see #734

Hi @dibyom, we are not facing an exceeded resource quota. We face the problem that when you create a resource (for example a Pod) that is constrained by a resource quota, then the creation can fail because of conflicts while updating the resource quota status (that's the long-standing Kubernetes issue that @yachna mentioned, https://github.com/kubernetes/kubernetes/issues/67761).

But maybe that function is where we need to move the retry logic @yachna, with maybe a requeue after one second.

SaschaSchwarze0 avatar Aug 04 '22 07:08 SaschaSchwarze0

@SaschaSchwarze0 thanks for the explanation! and yeah, instead of implementing custom retry logic, I think we should catch the error and put it back in the work queue to retry.

dibyom avatar Aug 04 '22 15:08 dibyom

/ok-to-test

dibyom avatar Aug 04 '22 15:08 dibyom

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.6% 78.9% -1.7

tekton-robot avatar Aug 04 '22 15:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.6% 80.9% 0.3

tekton-robot avatar Aug 09 '22 03:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.6% 81.4% 0.8

tekton-robot avatar Aug 09 '22 09:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.6% 81.1% 0.5

tekton-robot avatar Aug 09 '22 09:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.6% 81.1% 0.5

tekton-robot avatar Aug 09 '22 10:08 tekton-robot

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: yachna / name: Yachna (3bb201f48b2c2e78f6c0b9e474e193d433c92bd6)

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.3% 80.8% 0.5

tekton-robot avatar Aug 10 '22 12:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.3% 80.8% 0.5

tekton-robot avatar Aug 10 '22 12:08 tekton-robot

@dibyom we are ready here, code has been moved :-)

SaschaSchwarze0 avatar Aug 11 '22 08:08 SaschaSchwarze0

Thanks @yachna and @SaschaSchwarze0 The code looks fine to me. One request - could we update the commit message with a bit more description around what the issue was that this commit solves (like the description here so that it meets our commit guidelines: https://github.com/tektoncd/community/blob/main/standards.md#commits

dibyom avatar Aug 11 '22 16:08 dibyom

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar Aug 18 '22 14:08 tekton-robot

Thanks @yachna and @SaschaSchwarze0 The code looks fine to me. One request - could we update the commit message with a bit more description around what the issue was that this commit solves (like the description here so that it meets our commit guidelines: https://github.com/tektoncd/community/blob/main/standards.md#commits

Hi @dibyom

Thanks for the suggestion. I have updated the commit message accordingly.

yachna avatar Aug 22 '22 03:08 yachna

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.4% 80.9% 0.5

tekton-robot avatar Aug 22 '22 03:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 80.4% 80.9% 0.5

tekton-robot avatar Aug 22 '22 04:08 tekton-robot

/lgtm

dibyom avatar Aug 30 '22 22:08 dibyom

/test pull-tekton-pipeline-integration-tests /test pull-tekton-pipeline-alpha-integration-tests

SaschaSchwarze0 avatar Aug 31 '22 06:08 SaschaSchwarze0

/retest

abayer avatar Aug 31 '22 13:08 abayer