karmada
karmada copied to clipboard
support auto delete WorkloadRebalancer when time up
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support auto delete WorkloadRebalancer when time up:
referring to Automatic Cleanup for Finished Jobs.
Introduces field ttlSecondsAfterFinished which limits the lifetime of a WorkloadRebalancer that has finished execution
(finished execution means each target workload is finished with result of Successful or Failed).
- If this field is set,
ttlSecondsAfterFinishedafter the WorkloadRebalancer finishes, it is eligible to be automatically deleted. - If this field is unset, the WorkloadRebalancer won't be automatically deleted.
- If this field is set to zero, the WorkloadRebalancer becomes eligible to be deleted immediately after it finishes.
Considering several corner cases:
- case 1: if a new target workload was added into
WorkloadRebalancerbeforettlSecondsAfterFinishedexpired, which means the finish time of theWorkloadRebalanceris refreshed, so thedeleteaction is deferred since expire time is refreshed too. - case 2: if
ttlSecondsAfterFinishedis modified beforettlSecondsAfterFinishedexpired,deleteaction should be performed according to latestttlSecondsAfterFinished. - case 3: when we have got and checked latest
WorkloadRebalancerobject and try to delete it, if a modification toWorkloadRebalanceroccurred just right between the two time point, the previousdeleteaction should be Interrupted.
Several key implementation:
- A
WorkloadRebalanceris judged as finished should meet two requirements:- all expected workloads are finished with result of
SuccessfulorFailed. - introduce a new field named
ObservedGenerationtoStatusof WorkloadRebalancer, and it should be equal to.metadata.Generation, to prevent that the WorkloadRebalancer is updated but controller hasn't in time refreshed itsStatus.
- all expected workloads are finished with result of
- When
WorkloadRebalancerisCreatedorUpdated, add it to the workqueue and calculate its expiring time, and callworkqueue.AddAfter()function to re-enqueue it once more if it hasn't expired. - Before deleting the
WorkloadRebalancer, do a final sanity check. Use the latestWorkloadRebalancerdirectly fetched from api server to see if the TTL truly expires, rather than object from lister cache. - When deleting the
WorkloadRebalancer, it is needed to confirm that theresourceVersionof the deleted object is as expected, to prevent from above corner case 3.
Which issue(s) this PR fixes:
Fixes part of #4840
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
support auto delete WorkloadRebalancer when time up
/hold
for #4875 #4860
/hold cancel
Codecov Report
Attention: Patch coverage is 71.92982% with 16 lines in your changes are missing coverage. Please review.
Project coverage is 53.34%. Comparing base (
d465fcd) to head (a8b4050). Report is 2 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| ...orkloadrebalancer/workloadrebalancer_controller.go | 71.92% | 9 Missing and 7 partials :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #4894 +/- ##
=======================================
Coverage 53.34% 53.34%
=======================================
Files 252 252
Lines 20481 20531 +50
=======================================
+ Hits 10926 10953 +27
- Misses 8834 8853 +19
- Partials 721 725 +4
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 53.34% <71.92%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
CC @RainbowMango can this begin to review?
/assign
By the way.
If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.
ttlSecondsAfterFinished --> ttlMinutesAfterFinished in the PR description.
CC @RainbowMango ready to continue
besides, I modified a little code in e2e test, please help me do a check @XiShanYongYe-Chang
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [RainbowMango]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/hold Please @XiShanYongYe-Chang take another look, particularly the e2e part.
Please @XiShanYongYe-Chang take another look, particularly the e2e part.
@XiShanYongYe-Chang not only e2e, but also ut
@RainbowMango sorry, I did a updation after you send lgtm: https://github.com/karmada-io/karmada/compare/e405822231187e59ef62505a349aea75b8411ed2..f5076cdbeeb796d7df6426beb7ad98a1e790ceac
Ok~ /assign
[FAILED] in [DeferCleanup (Each)] - /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839
It seems related, see the logs here.
[FAILED] Unexpected error:
<*errors.StatusError | 0xc0005c8d20>:
workloadrebalancers.apps.karmada.io "rebalancer-lhnmp" not found
{
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "Failure",
Message: "workloadrebalancers.apps.karmada.io \"rebalancer-lhnmp\" not found",
Reason: "NotFound",
Details: {
Name: "rebalancer-lhnmp",
Group: "apps.karmada.io",
Kind: "workloadrebalancers",
UID: "",
Causes: nil,
RetryAfterSeconds: 0,
},
Code: 404,
},
}
occurred
In [DeferCleanup (Each)] at: /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839
CI problem fixed, is there any further comments?
/lgtm Thanks
/hold cancel