karmada
karmada copied to clipboard
Reschedule ResourceBinding when adding a cluster
Signed-off-by: chaunceyjiang [email protected]
What type of PR is this? /kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes: Fixes #2261
Special notes for your reviewer:
When a new cluster is joined, if the Placement
is empty or the replicaSchedulingType
is Duplicated
. Will propagate resources to the new cluster
Does this PR introduce a user-facing change?:
Reschedule ResourceBinding when adding a cluster
Is it ready for review now? Please assign it to me once it gets ready.
Is it ready for review now? Please assign it to me once it gets ready.
Ready!
/assign @Garrybest @XiShanYongYe-Chang @RainbowMango
Got it. Thanks. I can look at it tomorrow.

This E2E test failed in the case

But in my githuh action is successful
I see your commit is based on 7.30 commits. Rebase your code and try again?
Actually, I re-triggered the E2E once this morning.
Rebase your code and try again?
ok
Maybe my code is not robust enough. Let me think again
/assign @Garrybest @XiShanYongYe-Chang @RainbowMango @dddddai
can we have a talk at next meeting?
ok
Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.
Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.
In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.
- More trigger events should be considered.
- When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.
- When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.
- We could schedule RB/CRB every time in
doScheduleBinding
anddoScheduleClusterBinding
ifReplicaSchedulingType
isDuplicated
.
- Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when
Duplicated
because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but ifSpreadConstraints
does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB withDuplicated
policy when events are received. - BTW, we don't need to care about RB/CRB with
Divided
policy. Whenspec.replicas
equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.
/cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang
@Garrybest: GitHub didn't allow me to request PR reviews from the following users: chaunceyjiang.
Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.
In response to this:
Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.
Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.
In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.
- More trigger events should be considered.
- When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.
- When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.
- We could schedule RB/CRB every time in
doScheduleBinding
anddoScheduleClusterBinding
ifReplicaSchedulingType
isDuplicated
.
- Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when
Duplicated
because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but ifSpreadConstraints
does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB withDuplicated
policy when events are received.- BTW, we don't need to care about RB/CRB with
Divided
policy. Whenspec.replicas
equals to sum of target scheduled replicas, we will never reschedule this RB/CRB./cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.
BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.
Can we just put all related RB/CRB into queue, and don't care whether its ReplicaSchedulingType
is Duplicated
or not?
For the Divided
sceanrio, does it means non-opts?
Put them into queue does not trigger scheduling, now we only reschedule when placement or replicas changes, please see here
When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.
Do you mean to reschedule RB/CRBs when the label of the member cluster changes?
Right, based on #2391, this PR could be simplified. When adding a cluster, or cluster labels change, we only need to add the corresponding RB/CRB into queue. Duplicated RB/CRB will be always scheduled every time.
/assign
@chaunceyjiang @RainbowMango @Garrybest
The CI e2e test failed for:
2022-08-25T04:54:23.435597696Z stderr F I0825 04:54:23.435529 1 api_enablement.go:38] Cluster(member-e2e-7xk) not fit as missing API(apps/v1, kind=Deployment)
2022-08-25T04:54:23.435998391Z stderr F I0825 04:54:23.435956 1 generic_scheduler.go:108] cluster "member-e2e-7xk" is not fit, reason: no such API resource
When the scheduler detects the cluster create event, the cluster API information is not collected now. As a result, the scheduler filters out the new cluster.
Currently, the modification logic does not detect the cluster status update. Therefore, rescheduling is not triggered after the cluster API collection is complete.
Therefore, we need to adjust the events of the cluster.
Thanks @XiShanYongYe-Chang, That's the root cause.
Thank you very much @XiShanYongYe-Chang
But why can #2427 also make this PR pass the E2E test?
https://github.com/chaunceyjiang/karmada/actions/runs/2925842870
But why can https://github.com/karmada-io/karmada/pull/2427 also make this PR pass the E2E test?
Seems the falling E2E in this patch, not in #2427.
But why can #2427 also make this PR pass the E2E test?
Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.
But the addition of stains should be cautious and reasonable, the cluster is reachable when it is just joined, but the status is not synchronized.
Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.
yes, this is what happens when I test locally
but the status is not synchronized.
https://github.com/karmada-io/karmada/blob/2daaa2d8c7984fe96f224d7440da6b2fb04111c3/pkg/controllers/cluster/cluster_controller.go#L608-L610
That's why i think when currentReadyCondition
is nil
, we should add a NotReady
taint for the cluster
In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration
instead of the APIEnablement
plugin.
In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration instead of the APIEnablement plugin.
+1.
Can we continue the discussion on #2427? I'm afraid we can't reach a consensus quickly.
The discussion would be should a new cluster deserve a default taint until it's getting ready?
Given the new release is coming soon, I hope this patch could be included, can we move forward with it?
should a new cluster deserve a default taint until it's getting ready?
Yes, that's what I want to say
@chaunceyjiang Do you want to wait for #2427?
Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?
@chaunceyjiang Do you want to wait for https://github.com/karmada-io/karmada/pull/2427?
yes.
Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?
i can try
@chaunceyjiang now #2427 has been merged. could you please revise and rebase your code?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [RainbowMango]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment