karmada Reschedule ResourceBinding when adding a cluster

Signed-off-by: chaunceyjiang [email protected]

What type of PR is this? /kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes: Fixes #2261

Special notes for your reviewer: When a new cluster is joined, if the Placement is empty or the replicaSchedulingType is Duplicated. Will propagate resources to the new cluster

Does this PR introduce a user-facing change?:

Reschedule ResourceBinding when adding a cluster

Aug 01 '22 15:08 chaunceyjiang

Is it ready for review now? Please assign it to me once it gets ready.

Aug 06 '22 08:08 RainbowMango

Is it ready for review now? Please assign it to me once it gets ready.

Ready!

Aug 08 '22 09:08 chaunceyjiang

/assign @Garrybest @XiShanYongYe-Chang @RainbowMango

Aug 08 '22 09:08 chaunceyjiang

Got it. Thanks. I can look at it tomorrow.

Aug 08 '22 11:08 RainbowMango

This E2E test failed in the case

But in my githuh action is successful

Aug 09 '22 02:08 chaunceyjiang

I see your commit is based on 7.30 commits. Rebase your code and try again?

Actually, I re-triggered the E2E once this morning.

Aug 09 '22 03:08 RainbowMango

Rebase your code and try again?

ok

Maybe my code is not robust enough. Let me think again

Aug 09 '22 03:08 chaunceyjiang

/assign @Garrybest @XiShanYongYe-Chang @RainbowMango @dddddai

Aug 12 '22 02:08 chaunceyjiang

can we have a talk at next meeting?

ok

Aug 12 '22 08:08 chaunceyjiang

Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.

Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.

In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.

More trigger events should be considered.

When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.
When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.

We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.

Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when Duplicated because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but if SpreadConstraints does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB with Duplicated policy when events are received.
BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

/cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang

Aug 18 '22 14:08 Garrybest

@Garrybest: GitHub didn't allow me to request PR reviews from the following users: chaunceyjiang.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.

Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.

In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.

More trigger events should be considered.

When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.

When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.

We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.

Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when Duplicated because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but if SpreadConstraints does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB with Duplicated policy when events are received.

BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

/cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 18 '22 14:08 karmada-bot

We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.

BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

Can we just put all related RB/CRB into queue, and don't care whether its ReplicaSchedulingType is Duplicated or not? For the Divided sceanrio, does it means non-opts?

Aug 19 '22 02:08 RainbowMango

Put them into queue does not trigger scheduling, now we only reschedule when placement or replicas changes, please see here

Aug 19 '22 07:08 Garrybest

When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.

Do you mean to reschedule RB/CRBs when the label of the member cluster changes?

Aug 22 '22 07:08 chaunceyjiang

Right, based on #2391, this PR could be simplified. When adding a cluster, or cluster labels change, we only need to add the corresponding RB/CRB into queue. Duplicated RB/CRB will be always scheduled every time.

Aug 22 '22 08:08 Garrybest

/assign

Aug 22 '22 09:08 Garrybest

@chaunceyjiang @RainbowMango @Garrybest

The CI e2e test failed for:

2022-08-25T04:54:23.435597696Z stderr F I0825 04:54:23.435529       1 api_enablement.go:38] Cluster(member-e2e-7xk) not fit as missing API(apps/v1, kind=Deployment)
2022-08-25T04:54:23.435998391Z stderr F I0825 04:54:23.435956       1 generic_scheduler.go:108] cluster "member-e2e-7xk" is not fit, reason: no such API resource

When the scheduler detects the cluster create event, the cluster API information is not collected now. As a result, the scheduler filters out the new cluster.

Currently, the modification logic does not detect the cluster status update. Therefore, rescheduling is not triggered after the cluster API collection is complete.

Therefore, we need to adjust the events of the cluster.

Aug 25 '22 12:08 XiShanYongYe-Chang

Thanks @XiShanYongYe-Chang, That's the root cause.

Aug 25 '22 12:08 RainbowMango

Thank you very much @XiShanYongYe-Chang

But why can #2427 also make this PR pass the E2E test?

https://github.com/chaunceyjiang/karmada/actions/runs/2925842870

Aug 25 '22 12:08 chaunceyjiang

But why can https://github.com/karmada-io/karmada/pull/2427 also make this PR pass the E2E test?

Seems the falling E2E in this patch, not in #2427.

Aug 25 '22 13:08 RainbowMango

But why can #2427 also make this PR pass the E2E test?

Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.

But the addition of stains should be cautious and reasonable, the cluster is reachable when it is just joined, but the status is not synchronized.

Aug 25 '22 13:08 XiShanYongYe-Chang

Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.

yes, this is what happens when I test locally

but the status is not synchronized.

https://github.com/karmada-io/karmada/blob/2daaa2d8c7984fe96f224d7440da6b2fb04111c3/pkg/controllers/cluster/cluster_controller.go#L608-L610

That's why i think when currentReadyCondition is nil , we should add a NotReady taint for the cluster

Aug 25 '22 13:08 chaunceyjiang

In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration instead of the APIEnablement plugin.

Aug 25 '22 13:08 chaunceyjiang

In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration instead of the APIEnablement plugin.

+1.

Can we continue the discussion on #2427? I'm afraid we can't reach a consensus quickly. The discussion would be should a new cluster deserve a default taint until it's getting ready?

Aug 25 '22 14:08 RainbowMango

Given the new release is coming soon, I hope this patch could be included, can we move forward with it?

Aug 26 '22 01:08 RainbowMango

should a new cluster deserve a default taint until it's getting ready?

Yes, that's what I want to say

Aug 26 '22 02:08 chaunceyjiang

@chaunceyjiang Do you want to wait for #2427?

Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?

Aug 26 '22 07:08 RainbowMango

@chaunceyjiang Do you want to wait for https://github.com/karmada-io/karmada/pull/2427?

yes.

Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?

i can try

Aug 26 '22 07:08 chaunceyjiang

@chaunceyjiang now #2427 has been merged. could you please revise and rebase your code?

Aug 27 '22 02:08 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Aug 27 '22 03:08 karmada-bot

karmada karmada copied to clipboard

Reschedule ResourceBinding when adding a cluster

karmada
karmada copied to clipboard