karmada icon indicating copy to clipboard operation
karmada copied to clipboard

fix the issue where federatedresourcequota becomes invalid when the workload scales up

Open zhzhuang-zju opened this issue 6 months ago • 4 comments

What type of PR is this? /kind bug

What this PR does / why we need it: We introduced federatedresourcequota enforcement in v1.14 to achieve unified quota management across clusters. This is realized by intercepting modifications to workload requirements and the spec.clusters field of resource bindings. However, the federated resource quota mechanism fails when workloads scale up.

The root cause is that the binding controller synchronizes workload modifications to member clusters. When the replicas scheduling policy is set to "duplicated", the binding controller does not revise the workload's replicas according to the scheduling results in bindingSpec.clusters. As a result, the scale-up operations of workloads are directly synchronized to member clusters, rendering the federated resource quota ineffective.

The detailed process is as follows:

  1. Edit the replicas of a workload.
  2. The binding controller synchronizes the workload's replicas to member clusters.
  3. Karmada-scheduler performs rescheduling, and the request to update bindingSpec.clusters is intercepted due to insufficient quota.

As described above, even though the update request for bindingSpec.clusters is blocked, the replicas of the workload in member clusters are still synchronized, leading to the failure of the federated resource quota.

This PR modifies the logic for the binding controller to sync workloads to member clusters. It syncs the replicas in bindingSpec.clusters to member clusters instead of the replicas of the workload.

Which issue(s) this PR fixes: Parts of #6467

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: fix the issue where federatedresourcequota becomes invalid when the workload scales up

zhzhuang-zju avatar Jun 19 '25 06:06 zhzhuang-zju

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 30.00000% with 14 lines in your changes missing coverage. Please review.

Project coverage is 48.98%. Comparing base (571801f) to head (7b9bf2f). Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
pkg/controllers/binding/common.go 30.00% 13 Missing and 1 partial :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6474      +/-   ##
==========================================
- Coverage   49.03%   48.98%   -0.06%     
==========================================
  Files         687      687              
  Lines       56038    56097      +59     
==========================================
  Hits        27478    27478              
- Misses      26779    26839      +60     
+ Partials     1781     1780       -1     
Flag Coverage Δ
unittests 48.98% <30.00%> (-0.06%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Jun 19 '25 06:06 codecov-commenter

@mszacillo Do you think it's necessary to backport this PR to release-1.14?

zhzhuang-zju avatar Jun 20 '25 09:06 zhzhuang-zju

@mszacillo Do you think it's necessary to backport this PR to release-1.14?

Yes I think so. I currently rebased our internal branch against the latest 1.14 release, so having this fix in that release would be helpful.

mszacillo avatar Jun 22 '25 12:06 mszacillo

/retitle Fix workload scale up bypass FederatedResourceQuota check issue

RainbowMango avatar Jun 25 '25 02:06 RainbowMango

@mszacillo Do you think it's necessary to backport this PR to release-1.14?

Yes I think so. I currently rebased our internal branch against the latest 1.14 release, so having this fix in that release would be helpful.

I don't mind backporting this. But I don't think this issue will affect @mszacillo's scenario, as it only applies to workloads using Duplicated mode.

RainbowMango avatar Jun 25 '25 02:06 RainbowMango

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mszacillo, RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Jun 25 '25 02:06 karmada-bot