karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Reschedule ResourceBinding when adding a cluster

Open chaunceyjiang opened this issue 2 years ago • 10 comments

Signed-off-by: chaunceyjiang [email protected]

What type of PR is this? /kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes: Fixes #2261

Special notes for your reviewer: When a new cluster is joined, if the Placement is empty or the replicaSchedulingType is Duplicated. Will propagate resources to the new cluster

Does this PR introduce a user-facing change?:

Reschedule ResourceBinding when adding a cluster

chaunceyjiang avatar Aug 01 '22 15:08 chaunceyjiang

Is it ready for review now? Please assign it to me once it gets ready.

RainbowMango avatar Aug 06 '22 08:08 RainbowMango

Is it ready for review now? Please assign it to me once it gets ready.

Ready!

chaunceyjiang avatar Aug 08 '22 09:08 chaunceyjiang

/assign @Garrybest @XiShanYongYe-Chang @RainbowMango

chaunceyjiang avatar Aug 08 '22 09:08 chaunceyjiang

Got it. Thanks. I can look at it tomorrow.

RainbowMango avatar Aug 08 '22 11:08 RainbowMango

image

This E2E test failed in the case

image

But in my githuh action is successful

chaunceyjiang avatar Aug 09 '22 02:08 chaunceyjiang

I see your commit is based on 7.30 commits. Rebase your code and try again?

Actually, I re-triggered the E2E once this morning.

RainbowMango avatar Aug 09 '22 03:08 RainbowMango

Rebase your code and try again?

ok

Maybe my code is not robust enough. Let me think again

chaunceyjiang avatar Aug 09 '22 03:08 chaunceyjiang

/assign @Garrybest @XiShanYongYe-Chang @RainbowMango @dddddai

chaunceyjiang avatar Aug 12 '22 02:08 chaunceyjiang

can we have a talk at next meeting?

ok

chaunceyjiang avatar Aug 12 '22 08:08 chaunceyjiang

Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.

Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.

In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.

  1. More trigger events should be considered.
  • When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.
  • When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.
  1. We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.
  • Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when Duplicated because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but if SpreadConstraints does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB with Duplicated policy when events are received.
  • BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

/cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang

Garrybest avatar Aug 18 '22 14:08 Garrybest

@Garrybest: GitHub didn't allow me to request PR reviews from the following users: chaunceyjiang.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Hi @chaunceyjiang, this PR has a close relation with #2391, I hope these 2 PRs make a cooperation together to solve the rescheduling issues.

Now we have decided to move Failover from scheduler to controller-manager, so when a cluster becomes not-ready or unreachable for a period, controller-manager would add a NoExecute taint on it which triggers a eviction procedure from this cluster. Now #2391 tries to remove Failover from scheduler, however, we still need a rescheduling procedure to make Failover complete.

In conclusion, I hope this PR is not only for adding a cluster but also for more circumstances. Here are some ideas.

  1. More trigger events should be considered.
  • When adding a new cluster, we reschedule all RB/CRBs whose PP/CPP is corresponding with this new cluster.
  • When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.
  1. We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.
  • Now when scheduler pops a RB/CRB from queue, it does not know whether the RB/CRB should be rescheduled again when Duplicated because the trigger events are various, e.g. cluster labels changes, some clusters are evicted from RB/CRB, add a new cluster just now, etc. When Failover happens, controller-manager now evicts clusters from RB/CRB, but if SpreadConstraints does not match, we still need scheduler to reschedule again. So it may be a simple and efficient solution that we always schedule RB/CRB with Duplicated policy when events are received.
  • BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

/cc @RainbowMango @chaunceyjiang @dddddai @XiShanYongYe-Chang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

karmada-bot avatar Aug 18 '22 14:08 karmada-bot

We could schedule RB/CRB every time in doScheduleBinding and doScheduleClusterBinding if ReplicaSchedulingType is Duplicated.

BTW, we don't need to care about RB/CRB with Divided policy. When spec.replicas equals to sum of target scheduled replicas, we will never reschedule this RB/CRB.

Can we just put all related RB/CRB into queue, and don't care whether its ReplicaSchedulingType is Duplicated or not? For the Divided sceanrio, does it means non-opts?

RainbowMango avatar Aug 19 '22 02:08 RainbowMango

Put them into queue does not trigger scheduling, now we only reschedule when placement or replicas changes, please see here

Garrybest avatar Aug 19 '22 07:08 Garrybest

When labels or cluster fields change, we reschedule all RB/CRBs whose PP/CPP is corresponding with this updated cluster.

Do you mean to reschedule RB/CRBs when the label of the member cluster changes?

chaunceyjiang avatar Aug 22 '22 07:08 chaunceyjiang

Right, based on #2391, this PR could be simplified. When adding a cluster, or cluster labels change, we only need to add the corresponding RB/CRB into queue. Duplicated RB/CRB will be always scheduled every time.

Garrybest avatar Aug 22 '22 08:08 Garrybest

/assign

Garrybest avatar Aug 22 '22 09:08 Garrybest

@chaunceyjiang @RainbowMango @Garrybest

The CI e2e test failed for:

2022-08-25T04:54:23.435597696Z stderr F I0825 04:54:23.435529       1 api_enablement.go:38] Cluster(member-e2e-7xk) not fit as missing API(apps/v1, kind=Deployment)
2022-08-25T04:54:23.435998391Z stderr F I0825 04:54:23.435956       1 generic_scheduler.go:108] cluster "member-e2e-7xk" is not fit, reason: no such API resource

When the scheduler detects the cluster create event, the cluster API information is not collected now. As a result, the scheduler filters out the new cluster.

Currently, the modification logic does not detect the cluster status update. Therefore, rescheduling is not triggered after the cluster API collection is complete.

Therefore, we need to adjust the events of the cluster.

XiShanYongYe-Chang avatar Aug 25 '22 12:08 XiShanYongYe-Chang

Thanks @XiShanYongYe-Chang, That's the root cause.

RainbowMango avatar Aug 25 '22 12:08 RainbowMango

Thank you very much @XiShanYongYe-Chang

But why can #2427 also make this PR pass the E2E test?

https://github.com/chaunceyjiang/karmada/actions/runs/2925842870

chaunceyjiang avatar Aug 25 '22 12:08 chaunceyjiang

But why can https://github.com/karmada-io/karmada/pull/2427 also make this PR pass the E2E test?

Seems the falling E2E in this patch, not in #2427.

RainbowMango avatar Aug 25 '22 13:08 RainbowMango

But why can #2427 also make this PR pass the E2E test?

Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.

But the addition of stains should be cautious and reasonable, the cluster is reachable when it is just joined, but the status is not synchronized.

XiShanYongYe-Chang avatar Aug 25 '22 13:08 XiShanYongYe-Chang

Due to the removal of the taint (after the cluster status collection is complete), the re-scheduling of the scheduler will be triggered, which will have the desired effect.

yes, this is what happens when I test locally

but the status is not synchronized.

https://github.com/karmada-io/karmada/blob/2daaa2d8c7984fe96f224d7440da6b2fb04111c3/pkg/controllers/cluster/cluster_controller.go#L608-L610

That's why i think when currentReadyCondition is nil , we should add a NotReady taint for the cluster

chaunceyjiang avatar Aug 25 '22 13:08 chaunceyjiang

In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration instead of the APIEnablement plugin.

chaunceyjiang avatar Aug 25 '22 13:08 chaunceyjiang

In other words, newly joined clusters should be filtered out by the scheduling plugin TaintToleration instead of the APIEnablement plugin.

+1.

Can we continue the discussion on #2427? I'm afraid we can't reach a consensus quickly. The discussion would be should a new cluster deserve a default taint until it's getting ready?

RainbowMango avatar Aug 25 '22 14:08 RainbowMango

Given the new release is coming soon, I hope this patch could be included, can we move forward with it?

RainbowMango avatar Aug 26 '22 01:08 RainbowMango

should a new cluster deserve a default taint until it's getting ready?

Yes, that's what I want to say

chaunceyjiang avatar Aug 26 '22 02:08 chaunceyjiang

@chaunceyjiang Do you want to wait for #2427?

Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?

RainbowMango avatar Aug 26 '22 07:08 RainbowMango

@chaunceyjiang Do you want to wait for https://github.com/karmada-io/karmada/pull/2427?

yes.

Can you update the patch in the tradeoff way like https://github.com/karmada-io/karmada/pull/2301#discussion_r954940371 and have a try?

i can try

chaunceyjiang avatar Aug 26 '22 07:08 chaunceyjiang

@chaunceyjiang now #2427 has been merged. could you please revise and rebase your code?

RainbowMango avatar Aug 27 '22 02:08 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Aug 27 '22 03:08 karmada-bot