karmada
karmada copied to clipboard
Proposal of introducing a rebalance mechanism to actively trigger rescheduling of resource
What type of PR is this?
/kind design /kind documentation
What this PR does / why we need it:
Proposal of introducing a rebalance mechanism to actively trigger rescheduling of resource.
Assuming the user has propagated the workloads to member clusters, in some scenarios the current replicas distribution is not the most expected, such as:
- replicas migrated due to cluster failover, while now cluster recovered.
- replicas migrated due to application-level failover, while now each cluster has sufficient resources to run the replicas.
- as for
Aggregated
schedule strategy, replicas were initially distributed across multiple clusters due to resource constraints, but now one cluster is enough to accommodate all replicas.
Therefore, the user desires for an approach to trigger rescheduling so that the replicas distribution can do a rebalance.
Which issue(s) this PR fixes:
Fixes part of #4840
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 53.33%. Comparing base (
5bc8c54
) to head (0e1922c
). Report is 113 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #4698 +/- ##
==========================================
+ Coverage 53.12% 53.33% +0.20%
==========================================
Files 251 252 +1
Lines 20417 20482 +65
==========================================
+ Hits 10847 10924 +77
+ Misses 8856 8836 -20
- Partials 714 722 +8
Flag | Coverage Δ | |
---|---|---|
unittests | 53.33% <ø> (+0.20%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This Pr mixes fault self-healing and rescheduling. I think fault self-healing includes rescheduling, similar to when a node crashes, the workload corresponding to the pod on the node will regenerate the pod. This is completed by multiple controllers working together, including a scheduler. If the goal is self-healing, then multiple components need to be considered for coordination. If it is only rescheduling, then only the target of eviction and the conditions for stopping eviction need to be considered. Can we consider the design concept of the Descheduler project in the community
I did a hard job to made a thorough improvement of this proposal, now everyone can go through it all over again, looking forward to your suggestions~
This Pr mixes fault self-healing and rescheduling.
@wu0407 Hello, I have updated this proposal. Actually, this proposal is about an entirely rescheduling, as for cluster failover is only a user story of it. For more imformation you can see in latest proposal, thank you for your comments~
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [RainbowMango]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment