karmada icon indicating copy to clipboard operation
karmada copied to clipboard

fix: do not reschedules the workload to the same unhealthy cluster when application failover enabled

Open jwcesign opened this issue 2 years ago • 4 comments

What type of PR is this? /kind bug /kind api-change

What this PR does / why we need it:

  1. If purgeMode: Immediately is used in CPP/PP, the application-failover controller will not add the EvictionTask to (RB/CRB).spec.evictionTasks. This will result in the scheduler rescheduling to the eviction clusters.
  2. If purgeMode: Gracefully is used in CPP/PP, but the gracePeriodSeconds is too short and the scheduler has a large number of items to handle, after the grace period passes, the graceful eviction controller will remove it from (RB/CRB).spec.evictionTasks, and then the scheduler will reschedule it to the eviction clusters.

Which issue(s) this PR fixes: Fixes https://github.com/karmada-io/karmada/issues/4208

Special notes for your reviewer: Now, I refactor the logic as follows:

  1. application-failover controller adds the eviction tasks to the (RB/CRB).spec.evictionTasks, no matter what purgeMode.
  2. The scheduler will ignore the clusters in the (RB/CRB).spec.evictionTasks, and reschedule the workload.
  3. Scheduler will update (RB/CRB).spec.evictionTasks[x].evicated to true.
  4. graceful eviction controller detects (RB/CRB).spec.evictionTasks[x].evicated with true, it will delete it from (RB/CRB).spec.evictionTasks.
  5. The execution controller will delete the work once the (RB/CRB).spec.evictionTasks[x] exceeds the gracePeriodSeconds.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Do not reschedule the workload to the same unhealthy cluster when application failover enabled.

jwcesign avatar Nov 09 '23 09:11 jwcesign

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign garrybest after the PR has been reviewed. You can assign the PR to them by writing /assign @garrybest in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Nov 09 '23 09:11 karmada-bot

/cc @XiShanYongYe-Chang @chaunceyjiang @chaosi-zju

jwcesign avatar Nov 10 '23 07:11 jwcesign

Codecov Report

Attention: 35 lines in your changes are missing coverage. Please review.

Comparison is base (d6e2881) 52.79% compared to head (4e1c2d3) 52.67%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4215      +/-   ##
==========================================
- Coverage   52.79%   52.67%   -0.13%     
==========================================
  Files         239      239              
  Lines       23584    23577       -7     
==========================================
- Hits        12452    12419      -33     
- Misses      10456    10479      +23     
- Partials      676      679       +3     
Flag Coverage Δ
unittests 52.67% <31.37%> (-0.13%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/apis/work/v1alpha2/binding_types_helper.go 90.69% <100.00%> (+0.10%) :arrow_up:
...ionfailover/crb_application_failover_controller.go 57.61% <100.00%> (ø)
...tionfailover/rb_application_failover_controller.go 57.32% <100.00%> (+0.27%) :arrow_up:
pkg/controllers/gracefuleviction/evictiontask.go 86.95% <100.00%> (-9.20%) :arrow_down:
pkg/util/helper/binding.go 69.41% <62.50%> (-0.52%) :arrow_down:
...acefuleviction/crb_graceful_eviction_controller.go 0.00% <0.00%> (ø)
...racefuleviction/rb_graceful_eviction_controller.go 0.00% <0.00%> (ø)
pkg/scheduler/scheduler.go 17.26% <0.00%> (-0.51%) :arrow_down:

... and 1 file with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Nov 10 '23 09:11 codecov-commenter

/cc @RainbowMango

I don't see the purpose of Immediately. Should we consider deprecating it?

jwcesign avatar Jan 04 '24 08:01 jwcesign

I don't see the purpose of Immediately. Should we consider deprecating it?

I added a comment on the PurgeMode Immediately with the real-world use case: https://github.com/karmada-io/karmada/blob/6f4d69cf8b2847dbeeb2f299652920dc89b049d7/pkg/apis/policy/v1alpha1/propagation_types.go#L267-L275

RainbowMango avatar Nov 29 '24 03:11 RainbowMango

/close In favor of #5881

RainbowMango avatar Nov 29 '24 03:11 RainbowMango

@RainbowMango: Closed this PR.

In response to this:

/close In favor of #5881

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

karmada-bot avatar Nov 29 '24 03:11 karmada-bot