descheduler
descheduler copied to clipboard
Descheduler seems to ignore cordoned nodes when nodeFit is enabled
What version of descheduler are you using?
descheduler version: v0.24.1 installed by Helm chart
Does this issue reproduce with the latest release? yes
Please provide a copy of your descheduler policy config file
Helm values.yaml
kind: Deployment deschedulerPolicy: strategies: RemovePodsViolatingTopologySpreadConstraint: enabled: true params: includeSoftConstraints: true nodeFit: true RemoveDuplicates: enabled: false RemovePodsViolatingNodeTaints: enabled: false RemovePodsViolatingNodeAffinity: enabled: false RemovePodsViolatingInterPodAntiAffinity: enabled: false LowNodeUtilization: enabled: false
What k8s version are you using (kubectl version
)?
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"archive", BuildDate:"2022-05-27T18:33:09Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
What did you do?
- Created a cluster with minikube (5 nodes)
- Set first node as unschedulable with taint
node-role.kubernetes.io/master:NoSchedule
- Set node label for 2,3 to
zone=primary
and 4,5 tozone=backup
- Created nginx deployment with
topologySpreadConstraints
onzone
withmaxSkew: 1
(yaml in attachment) - Cordoned and drained node 4 and 5
What did you expect to see? Pods drain all to zone=primary nodes and stay there until I uncordon, after which descheduler will do it's thing and move half back to zone=backup
What did you see instead?
Pods drain to zone primary, however descheduler seems to not respect all zone=backup
nodes being unschedulable and keeps forever killing half of pods on nodes with zone=primary
Attached as nodes.yaml
is the output of kubectl get nodes -o yaml
Archive.zip
Mentioned in slack, but I think the solution may be to either:
- change the set of nodes passed to the topology spread strategy, or
- add logic to the strategy to ignore tainted nodes in its domain calculation
right now, the strategy just operates on the default set of ReadyNodes
, which includes nodes that are tainted with NoSchedule
. So even though pods can't be rebalanced onto these nodes, it is still including them in the calculation for imbalanced domains.
imo, this strategy simply shouldn't care about tainted nodes. if a node is tainted with NoSchedule
, then there is no effect in trying to balance it as part of the domain.
the argument could be made that the NoSchedule
nodes might be oversized and thus need pods to be evicted. But this is more a job for the RemovePodsViolatingNodeTaints strategy, so imo that point is moot
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.