karmada
karmada copied to clipboard
Propose binding priority and preemption
What type of PR is this? /kind design
What this PR does / why we need it: Propose binding priority and preemption
Which issue(s) this PR fixes: Fixes #4938
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
/cc @RainbowMango @XiShanYongYe-Chang
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 48.12%. Comparing base (
d4c2793) to head (59a6afd). Report is 1195 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #4993 +/- ##
==========================================
- Coverage 53.33% 48.12% -5.21%
==========================================
Files 252 668 +416
Lines 20482 55291 +34809
==========================================
+ Hits 10924 26609 +15685
- Misses 8836 26948 +18112
- Partials 722 1734 +1012
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 48.12% <ø> (-5.21%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks~ /assign
/assign @RainbowMango
Hi @whitewindmills I guess it's a good chance to introduce this proposal at tomorrow's community meeting, what do you think?
hi all, can we continue this proposal?
hi all, can we continue this proposal?
+1
Hi all,
Thank you all for the amazing feedback today! In summary, our design currently only targets the resource that is scheduled in a single cluster, and the preemption only happens for the bindings in one cluster.
Here are the points we agree with in this proposal:
- Need to redesign the scheduler queue so that it is aware of the priority and it will pop out the
resource bindingwith highest priority first. - Reuse the native
priorityClassand create a new API if we need to extend the functionality in the future. - Resolving the
priorityvalue andpreemptionPolicy/preemptionBehaviorto the resource binding so that the scheduler won’t need to query thepriorityClassto find the values. - If scheduling fails due to insufficient resources (or can not find feasible clusters), we should continue attempting to schedule other pending
ResourcesBindingsinstead of blocking the whole scheduling process. - Reschedule the preempted
bindings. (Need to carefully consider the backoff time for the preempted bindings to avoid reschedule the preempted bindings before the preemptor binding.)
These are the points where we have different views or additional questions and thoughts:
- Could you please clarify if there are any use cases that require binding the
priorityClassNamewith thePropagationPolicy/ClusterPropagationPolicy? Maybe because you don't want to enforce that all the resources must have apriorityClassNamefiled? We plan to put thepriorityandpreemptionBehavior/preemptionPolicyin thereplicaRequirements. So the user can just specify thepriorityClassNamein the resource and create the resource the same as what they do in the single cluster. We can usecustom resource interpreterto propagate thepriorityandpreemptionPolicyto thereplicaRequirements. - We propose using
scheduler-estimatorsto find the victim bindings since they have the accurate information of the clusters, and the scheduler will use the victim bindings to decide which bindings to preempt and perform preemption. - If possible, we would like to have an option or feature gate or a field in the binding for controlling if preempted bindings should be rescheduled.
Thanks for your time again! Please feel free to provide any comment. Reference: Our Proposal - https://docs.google.com/document/d/1MixmgLwnmiRrukyFP25JvE2hqiqB3f_JKT12tXiOsbc/edit?tab=t.0 cc @RainbowMango @kevin-wangzefeng @wengyao04 @LeonZh0u
@RainbowMango can this proposal be merged?
I want to give it another look and see if any comments from the others.
@kevin-wangzefeng @RainbowMango PTAL
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [RainbowMango]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment