feat: Add per-plugin nodeSelector support
Adds nodeSelector field to HighNodeUtilizationArgs and LowNodeUtilizationArgs to allow filtering nodes per plugin.
This enables different plugins in the same profile to target different sets of nodes based on labels, solving the limitation where only global nodeSelector was available.
Use case: Run HighNodeUtilization only on nodes blocked by cluster-autoscaler while other plugins run on all nodes.
Changes:
- Add NodeSelector field to both plugin Args structs
- Implement node filtering logic in Balance() methods
- Add nodeMatchesSelector helper function
- Generate deepcopy code for new fields
- Add comprehensive tests for nodeSelector logic
Addresses: #1486
Description
Checklist
Please ensure your pull request meets the following criteria before submitting for review, these items will be used by reviewers to assess the quality and completeness of your changes:
- [ ] Code Readability: Is the code easy to understand, well-structured, and consistent with project conventions?
- [ ] Naming Conventions: Are variable, function, and structs descriptive and consistent?
- [ ] Code Duplication: Is there any repeated code that should be refactored?
- [ ] Function/Method Size: Are functions/methods short and focused on a single task?
- [ ] Comments & Documentation: Are comments clear, useful, and not excessive? Were comments updated where necessary?
- [ ] Error Handling: Are errors handled appropriately ?
- [ ] Testing: Are there sufficient unit/integration tests?
- [ ] Performance: Are there any obvious performance issues or unnecessary computations?
- [ ] Dependencies: Are new dependencies justified ?
- [ ] Logging & Monitoring: Is logging used appropriately (not too verbose, not too silent)?
- [ ] Backward Compatibility: Does this change break any existing functionality or APIs?
- [ ] Resource Management: Are resources (files, connections, memory) managed and released properly?
- [ ] PR Description: Is the PR description clear, providing enough context and explaining the motivation for the change?
- [ ] Documentation & Changelog: Are README and docs updated if necessary?
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: SarCode / name: Sarthak Agarwal (3a24013acb21055a3dc09f2d168cab8411400078, 5fb59119ce2d7149a66b1f4d0e9e9f36533368ba, fc9d6ed919c3483cc877c6ba0cb16339f1dfac85)
Welcome @SarCode!
It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @SarCode. Thanks for your PR.
I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/ok-to-test
@SarCode: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/ok-to-test
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Apologies if I am being too eager but is there a way I could get a review on this, it's my first open source contribution so really excited for this 😁
@googs1025 or @ingvagabund maybe?
/ok-to-test
/retest
/retest
Formatting issues fixed 😃
@googs1025, the tests went through 👌
@SarCode thank you for your first contribution here and welcome to the open source community.
The current approach is less practical since only two plugin will benefit from it. Also, each plugin will need to specify the node selector which is more suitable for a profile level field. Saying that I recommend to take the more generic approach and define a node selector field on a profile level.
Thank you for the kind words @ingvagabund !
I was thinking of the same thing, but I thought it would be easier if I contained it to just Utilization use-case, I will make some changes and make it more generic 👌
What if we add optional profile-level nodeSelector while keeping existing plugin-level support:
type DeschedulerProfile struct {
Name string `json:"name"`
PluginConfigs []PluginConfig `json:"pluginConfig"`
Plugins Plugins `json:"plugins"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"` // NEW
}
Will look something like this
profiles:
- name: default
nodeSelector: # Applies to ALL plugins in profile
node-type: worker
pluginConfig:
- name: LowNodeUtilization
args: {} # No plugin-level nodeSelector needed
NodeSelector needs to be a string to keep with the current globally scoped node selector.
What if we add optional profile-level nodeSelector while keeping existing plugin-level support
Would there be any use for the plugin-level node selector?
Thank you for your contribution since I barely have time to fix it... I will review it recently as soon as possible, too.
What if we add optional profile-level nodeSelector while keeping existing plugin-level support
Would there be any use for the plugin-level node selector?
Agree. Just profile-level node selector seems to be ok. Because from my perspective, if someone wants to configure LowNodeUtilization with "no" nodeSelector while configure another plugin with "some" nodeSelector, it's better devided into 2 seperate profiles.
If I understand this correctly, the descheduler should run with separate profiles if they want different nodeSelector?
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
# Profile 1: For worker nodes only
- name: "worker-nodes"
nodeSelector: "node-type=worker" # Profile-level nodeSelector
pluginConfig:
- name: "LowNodeUtilization"
args:
thresholds:
cpu: 20
memory: 20
targetThresholds:
cpu: 50
memory: 50
plugins:
deschedule:
enabled:
- "LowNodeUtilization"
# Profile 2: For master nodes only
- name: "master-nodes"
nodeSelector: "node-type=master" # Different nodeSelector
pluginConfig:
- name: "HighNodeUtilization"
args:
thresholds:
cpu: 80
memory: 80
plugins:
balance:
enabled:
- "HighNodeUtilization"
Does descheduler supports this right now? cause if I remember correctly nodeSelector is global for all profiles
Looking at the docs it seems like profile right now does have a nodeSelector config
create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config
https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor
If I understand this correctly, the descheduler should run with separate profiles if they want different nodeSelector?
apiVersion: "descheduler/v1alpha2" kind: "DeschedulerPolicy" profiles: # Profile 1: For worker nodes only - name: "worker-nodes" nodeSelector: "node-type=worker" # Profile-level nodeSelector pluginConfig: - name: "LowNodeUtilization" args: thresholds: cpu: 20 memory: 20 targetThresholds: cpu: 50 memory: 50 plugins: deschedule: enabled: - "LowNodeUtilization" # Profile 2: For master nodes only - name: "master-nodes" nodeSelector: "node-type=master" # Different nodeSelector pluginConfig: - name: "HighNodeUtilization" args: thresholds: cpu: 80 memory: 80 plugins: balance: enabled: - "HighNodeUtilization"Does descheduler supports this right now? cause if I remember correctly nodeSelector is global for all profiles
Yes, your example is just what I want to implement and proposed inside #1486 . I don't think descheduler supports this right now. That's why we implement it here :)
Looking at the docs it seems like profile right now does have a nodeSelector config
create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config
https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor
If I understand it right, this is a plugin-level nodeSelector only for DefaultEvictor.
Looking at the docs it seems like profile right now does have a nodeSelector config
create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config
https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor
If I understand it right, this is a plugin-level nodeSelector only for default evictor
I tried the two profile config with different nodeSelector and it worked
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
# Profile 1: HighNodeUtilization targeting only worker nodes
- name: high-utilization-workers
pluginConfig:
- name: DefaultEvictor
args:
nodeSelector: "type=worker" # Only process worker nodes
evictLocalStoragePods: true
ignorePvcPods: true
nodeFit: true
- name: HighNodeUtilization
args:
thresholds:
cpu: 50
memory: 50
plugins:
balance:
enabled:
- HighNodeUtilization
# Profile 2: Other plugins targeting all non-control-plane nodes
- name: general-policies
pluginConfig:
- name: DefaultEvictor
args:
nodeSelector: "!node-role.kubernetes.io/control-plane" # All nodes except control-plane
evictLocalStoragePods: true
ignorePvcPods: true
nodeFit: true
- name: RemovePodsViolatingNodeAffinity
args:
nodeAffinityType:
- requiredDuringSchedulingIgnoredDuringExecution
- name: RemovePodsViolatingNodeTaints
args:
includePreferNoSchedule: true
- name: RemovePodsViolatingInterPodAntiAffinity
- name: RemovePodsViolatingTopologySpreadConstraint
plugins:
balance:
enabled:
- RemovePodsViolatingTopologySpreadConstraint
deschedule:
enabled:
- RemovePodsViolatingNodeTaints
- RemovePodsViolatingNodeAffinity
- RemovePodsViolatingInterPodAntiAffinity
The logs confirmed:
Profile 1 correctly processed only worker nodes for HighNodeUtilization Profile 2 correctly processed all non-control-plane nodes for other plugins
Looking at the docs it seems like profile right now does have a nodeSelector config
create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config
https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor
If I understand it right, this is a plugin-level nodeSelector only for default evictor
I tried the two profile config with different nodeSelector and it worked
apiVersion: "descheduler/v1alpha2" kind: "DeschedulerPolicy" profiles: # Profile 1: HighNodeUtilization targeting only worker nodes - name: high-utilization-workers pluginConfig: - name: DefaultEvictor args: nodeSelector: "type=worker" # Only process worker nodes evictLocalStoragePods: true ignorePvcPods: true nodeFit: true - name: HighNodeUtilization args: thresholds: cpu: 50 memory: 50 plugins: balance: enabled: - HighNodeUtilization # Profile 2: Other plugins targeting all non-control-plane nodes - name: general-policies pluginConfig: - name: DefaultEvictor args: nodeSelector: "!node-role.kubernetes.io/control-plane" # All nodes except control-plane evictLocalStoragePods: true ignorePvcPods: true nodeFit: true - name: RemovePodsViolatingNodeAffinity args: nodeAffinityType: - requiredDuringSchedulingIgnoredDuringExecution - name: RemovePodsViolatingNodeTaints args: includePreferNoSchedule: true - name: RemovePodsViolatingInterPodAntiAffinity - name: RemovePodsViolatingTopologySpreadConstraint plugins: balance: enabled: - RemovePodsViolatingTopologySpreadConstraint deschedule: enabled: - RemovePodsViolatingNodeTaints - RemovePodsViolatingNodeAffinity - RemovePodsViolatingInterPodAntiAffinityThe logs confirmed:
Profile 1 correctly processed only worker nodes for HighNodeUtilization Profile 2 correctly processed all non-control-plane nodes for other plugins
@SarCode Thanks for your verification! But just as I described, this use-case is trying to configure DefaultEvictor in 2 profiles. The nodeSelector is DefaultEvictor's argument. It's a tricky way to meet the initial requirement because now only DefaultEvictor implements EvictorPlugin interface. If someday we have a "FooEvictor" that implements EvictorPlugin, then the "FooEvictor" must supports nodeSelector as well.
So what I hope to do is add nodeSelector in DeschedulerProfile maybe like this:
type DeschedulerProfile struct {
Name string
PluginConfigs []PluginConfig
Plugins Plugins
NodeSelector *metav1.LabelSelector (or *string)
}
This can prevent that every plugin need to support nodeSelector one by one.
My intention behind this all is to run HighNodeUtilization only on nodes which can't be removed by cluster-autoscaler maybe due to local-storage or whatever.
Meanwhile all other plugins can run on all nodes.
NodeSelector on the profile level can skip many nodes. Whereas, if it is used through the DefaultEvictor all nodes still gets processed. Which will not be that effective in big clusters.