karmada icon indicating copy to clipboard operation
karmada copied to clipboard

support auto delete WorkloadRebalancer when time up

Open chaosi-zju opened this issue 1 year ago • 12 comments

What type of PR is this?

/kind feature

What this PR does / why we need it:

Support auto delete WorkloadRebalancer when time up:

referring to Automatic Cleanup for Finished Jobs.

Introduces field ttlSecondsAfterFinished which limits the lifetime of a WorkloadRebalancer that has finished execution (finished execution means each target workload is finished with result of Successful or Failed).

  • If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.
  • If this field is unset, the WorkloadRebalancer won't be automatically deleted.
  • If this field is set to zero, the WorkloadRebalancer becomes eligible to be deleted immediately after it finishes.

Considering several corner cases:

  • case 1: if a new target workload was added into WorkloadRebalancer before ttlSecondsAfterFinished expired, which means the finish time of the WorkloadRebalancer is refreshed, so the delete action is deferred since expire time is refreshed too.
  • case 2: if ttlSecondsAfterFinished is modified before ttlSecondsAfterFinished expired, delete action should be performed according to latest ttlSecondsAfterFinished.
  • case 3: when we have got and checked latest WorkloadRebalancer object and try to delete it, if a modification to WorkloadRebalancer occurred just right between the two time point, the previous delete action should be Interrupted.

Several key implementation:

  • A WorkloadRebalancer is judged as finished should meet two requirements:
    • all expected workloads are finished with result of Successful or Failed.
    • introduce a new field named ObservedGeneration to Status of WorkloadRebalancer, and it should be equal to .metadata.Generation, to prevent that the WorkloadRebalancer is updated but controller hasn't in time refreshed its Status.
  • When WorkloadRebalancer is Created or Updated, add it to the workqueue and calculate its expiring time, and call workqueue.AddAfter() function to re-enqueue it once more if it hasn't expired.
  • Before deleting the WorkloadRebalancer, do a final sanity check. Use the latest WorkloadRebalancer directly fetched from api server to see if the TTL truly expires, rather than object from lister cache.
  • When deleting the WorkloadRebalancer, it is needed to confirm that the resourceVersion of the deleted object is as expected, to prevent from above corner case 3.

Which issue(s) this PR fixes:

Fixes part of #4840

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

support auto delete WorkloadRebalancer when time up

chaosi-zju avatar Apr 30 '24 07:04 chaosi-zju

/hold

for #4875 #4860

chaosi-zju avatar Apr 30 '24 07:04 chaosi-zju

/hold cancel

chaosi-zju avatar May 09 '24 13:05 chaosi-zju

Codecov Report

Attention: Patch coverage is 71.92982% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 53.34%. Comparing base (d465fcd) to head (a8b4050). Report is 2 commits behind head on master.

Files Patch % Lines
...orkloadrebalancer/workloadrebalancer_controller.go 71.92% 9 Missing and 7 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4894   +/-   ##
=======================================
  Coverage   53.34%   53.34%           
=======================================
  Files         252      252           
  Lines       20481    20531   +50     
=======================================
+ Hits        10926    10953   +27     
- Misses       8834     8853   +19     
- Partials      721      725    +4     
Flag Coverage Δ
unittests 53.34% <71.92%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 09 '24 13:05 codecov-commenter

CC @RainbowMango can this begin to review?

chaosi-zju avatar May 13 '24 03:05 chaosi-zju

/assign

By the way.

If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.

ttlSecondsAfterFinished --> ttlMinutesAfterFinished in the PR description.

RainbowMango avatar May 14 '24 08:05 RainbowMango

CC @RainbowMango ready to continue

besides, I modified a little code in e2e test, please help me do a check @XiShanYongYe-Chang

chaosi-zju avatar May 22 '24 10:05 chaosi-zju

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar May 24 '24 08:05 karmada-bot

/hold Please @XiShanYongYe-Chang take another look, particularly the e2e part.

RainbowMango avatar May 24 '24 08:05 RainbowMango

Please @XiShanYongYe-Chang take another look, particularly the e2e part.

@XiShanYongYe-Chang not only e2e, but also ut

chaosi-zju avatar May 24 '24 08:05 chaosi-zju

@RainbowMango sorry, I did a updation after you send lgtm: https://github.com/karmada-io/karmada/compare/e405822231187e59ef62505a349aea75b8411ed2..f5076cdbeeb796d7df6426beb7ad98a1e790ceac

chaosi-zju avatar May 24 '24 09:05 chaosi-zju

Ok~ /assign

XiShanYongYe-Chang avatar May 24 '24 09:05 XiShanYongYe-Chang

[FAILED] in [DeferCleanup (Each)] - /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839

It seems related, see the logs here.

[FAILED] Unexpected error:
      <*errors.StatusError | 0xc0005c8d20>: 
      workloadrebalancers.apps.karmada.io "rebalancer-lhnmp" not found
      {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "workloadrebalancers.apps.karmada.io \"rebalancer-lhnmp\" not found",
              Reason: "NotFound",
              Details: {
                  Name: "rebalancer-lhnmp",
                  Group: "apps.karmada.io",
                  Kind: "workloadrebalancers",
                  UID: "",
                  Causes: nil,
                  RetryAfterSeconds: 0,
              },
              Code: 404,
          },
      }
  occurred
  In [DeferCleanup (Each)] at: /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839

RainbowMango avatar May 27 '24 01:05 RainbowMango

CI problem fixed, is there any further comments?

chaosi-zju avatar May 27 '24 02:05 chaosi-zju

/lgtm Thanks

XiShanYongYe-Chang avatar May 27 '24 02:05 XiShanYongYe-Chang

/hold cancel

RainbowMango avatar May 27 '24 03:05 RainbowMango