descheduler
descheduler copied to clipboard
topologyspreadconstraint isn't work as expected
What version of descheduler are you using?
descheduler version: v0.24.0
Does this issue reproduce with the latest release?
YES
Which descheduler CLI options are you using?
Not using CLI
Please provide a copy of your descheduler policy config file
deschedulerPolicy: strategies: RemoveDuplicates: enabled: true RemovePodsViolatingNodeTaints: enabled: true RemovePodsViolatingNodeAffinity: enabled: false params: nodeAffinityType: - requiredDuringSchedulingIgnoredDuringExecution RemovePodsViolatingInterPodAntiAffinity: enabled: false LowNodeUtilization: enabled: false params: nodeResourceUtilizationThresholds: thresholds: cpu: 20 memory: 20 pods: 20 targetThresholds: cpu: 50 memory: 50 pods: 50 RemovePodsViolatingTopologySpreadConstraint: enabled: true params: namespaces: include: - "back"
What k8s version are you using (kubectl version
)?
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
What did you do?
Hi. I’ve got 3 AZ. AZ1, AZ2, AZ3 and deployment with replicaset equals 3 and topologySpreadConstraints with
topologySpreadConstraints: - labelSelector: matchLabels: app.kubernetes.io/instance: app-name app.kubernetes.io/name: app-name maxSkew: 2 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchLabels: app.kubernetes.io/instance: app-name app.kubernetes.io/name: app-name maxSkew: 1 topologyKey: topology.kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway
app-name-fcd89fb79-9ffzp 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3 app-name-fcd89fb79-wvlkc 2/2 Running 0 18m 10.10.27.152 cl13khna1utejao014uh-oqar --AZ2 app-name-fcd89fb79-26x4t 2/2 Running 0 18m 10.10.26.39 cl1d0jmpptqnae8fc5ep-uqew --AZ1
So, if AZ1 goes down, the pod has killed and scheduled to AZ2 or AZ3. But when the AZ1 went back the descheduler not evicted the pod from another zones and topology is imbalanced now. AZ3 has two pods, AZ2 has one pod and AZ1 has zero pods.
app-name-fcd89fb79-9ffzp 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3 app-name-fcd89fb79-wvlkc 2/2 Running 0 18m 10.10.27.152 cl13khna1utejao014uh-oqar --AZ2 app-name-fcd89fb79-zpp2d 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3
In the log:
I0816 10:46:45.061388 1 topologyspreadconstraint.go:168] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:2 TopologyKey:topology.kubernetes.io/zone WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app.kubernetes.io/instance: app-name,app.kubernetes.io/name: app-name,},MatchExpressions:[]LabelSelectorRequirement{},} MinDomains:
What did you expect to see?
I'm expecting that when the AZ went back the descheduler rebalance pods across all available AZ
What did you see instead?
The descheduler not rebalance the pods and AZ1 has no any pods of app.
Hi @mczimm,
We talked about this in slack, but to continue the discussion here for visibility: the maxSkew=2
constraint is already balanced when the third AZ comes back up and the sizes are [2,1,0]
, so there's nothing for the descheduler to do in that case.
However, the second constraint (with hostname
, maxSkew: 1
, and ScheduleAnyway
) should cause an eviction, due to the maxSkew: 1
. Do you have any more logs from the descheduler that mention that constraint? The logs you shared only mention the first one with maxSkew: 2
. Please try to reproduce this with log level -v=4
and post the full output if you can.
We don't have any test cases for this exact scenario (this is the closest one), but if we can reproduce it in a unit test we could find out more about this issue.
Hi @damemi, I did more tests with -v=4, and now I've got this. The topology now is
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: app-name
app.kubernetes.io/name: app-name
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
The Desceduler has the config
deschedulerPolicy: strategies: RemoveDuplicates: enabled: true RemovePodsViolatingNodeTaints: enabled: true RemovePodsViolatingTopologySpreadConstraint: enabled: true params: includeSoftConstraints: true namespaces: include: - "back"
After the AZ1 fail and get back the pods spread like this [2pod/1pod/0pod]
The Descheduler has the log
I0816 14:25:41.458657 1 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="back/app-name-7875b946dc-d889d" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"
After evictLocalStoragePods set to true, the pods evicted successfully.
I've one more question. The metrics has pod name of descheduler is it possible to have pod name which was evicted?
In the logs:
I0816 14:25:41.458657 1 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="back/app-name-7875b946dc-d889d" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"
In the metrics:
descheduler_pods_evicted{cloud="cloud", cluster="cluster", container="descheduler", endpoint="http-metrics", instance="10.10.32.135:10258", job="descheduler", namespace="back", node="cl13khna1utejao014uh-oqar", pod="descheduler-64f4ccfdb7-cq5sm", prometheus="prometheus", result="success", service="descheduler", strategy="PodTopologySpread"}
Hi @damemi, can please tell me how it will be comfortable for you to discuss the pod name in the metrics? Within this issue or I should open a new one?
@mczimm let's open another issue for that and close this one as fixed. I don't think that metric will really be the right place for that kind of label, but we can discuss other options /close
@damemi: Closing this issue.
In response to this:
@mczimm let's open another issue for that and close this one as fixed. I don't think that metric will really be the right place for that kind of label, but we can discuss other options /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.