descheduler topologyspreadconstraint isn't work as expected

What version of descheduler are you using?

descheduler version: v0.24.0

Does this issue reproduce with the latest release?

YES

Which descheduler CLI options are you using?

Not using CLI

Please provide a copy of your descheduler policy config file

deschedulerPolicy: strategies: RemoveDuplicates: enabled: true RemovePodsViolatingNodeTaints: enabled: true RemovePodsViolatingNodeAffinity: enabled: false params: nodeAffinityType: - requiredDuringSchedulingIgnoredDuringExecution RemovePodsViolatingInterPodAntiAffinity: enabled: false LowNodeUtilization: enabled: false params: nodeResourceUtilizationThresholds: thresholds: cpu: 20 memory: 20 pods: 20 targetThresholds: cpu: 50 memory: 50 pods: 50 RemovePodsViolatingTopologySpreadConstraint: enabled: true params: namespaces: include: - "back"

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Hi. I’ve got 3 AZ. AZ1, AZ2, AZ3 and deployment with replicaset equals 3 and topologySpreadConstraints with

topologySpreadConstraints: - labelSelector: matchLabels: app.kubernetes.io/instance: app-name app.kubernetes.io/name: app-name maxSkew: 2 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchLabels: app.kubernetes.io/instance: app-name app.kubernetes.io/name: app-name maxSkew: 1 topologyKey: topology.kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway

app-name-fcd89fb79-9ffzp 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3 app-name-fcd89fb79-wvlkc 2/2 Running 0 18m 10.10.27.152 cl13khna1utejao014uh-oqar --AZ2 app-name-fcd89fb79-26x4t 2/2 Running 0 18m 10.10.26.39 cl1d0jmpptqnae8fc5ep-uqew --AZ1

So, if AZ1 goes down, the pod has killed and scheduled to AZ2 or AZ3. But when the AZ1 went back the descheduler not evicted the pod from another zones and topology is imbalanced now. AZ3 has two pods, AZ2 has one pod and AZ1 has zero pods.

app-name-fcd89fb79-9ffzp 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3 app-name-fcd89fb79-wvlkc 2/2 Running 0 18m 10.10.27.152 cl13khna1utejao014uh-oqar --AZ2 app-name-fcd89fb79-zpp2d 2/2 Running 0 18m 10.10.28.8 cl12rs9frjb2cnacj66g-yfom --AZ3

In the log:

I0816 10:46:45.061388 1 topologyspreadconstraint.go:168] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:2 TopologyKey:topology.kubernetes.io/zone WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app.kubernetes.io/instance: app-name,app.kubernetes.io/name: app-name,},MatchExpressions:[]LabelSelectorRequirement{},} MinDomains:}

What did you expect to see?

I'm expecting that when the AZ went back the descheduler rebalance pods across all available AZ

What did you see instead?

The descheduler not rebalance the pods and AZ1 has no any pods of app.

Aug 16 '22 10:08 mczimm

Hi @mczimm, We talked about this in slack, but to continue the discussion here for visibility: the maxSkew=2 constraint is already balanced when the third AZ comes back up and the sizes are [2,1,0], so there's nothing for the descheduler to do in that case.

However, the second constraint (with hostname, maxSkew: 1, and ScheduleAnyway) should cause an eviction, due to the maxSkew: 1. Do you have any more logs from the descheduler that mention that constraint? The logs you shared only mention the first one with maxSkew: 2. Please try to reproduce this with log level -v=4 and post the full output if you can.

We don't have any test cases for this exact scenario (this is the closest one), but if we can reproduce it in a unit test we could find out more about this issue.

Aug 16 '22 14:08 damemi

Hi @damemi, I did more tests with -v=4, and now I've got this. The topology now is

  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/instance: app-name
        app.kubernetes.io/name: app-name
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway

The Desceduler has the config

deschedulerPolicy: strategies: RemoveDuplicates: enabled: true RemovePodsViolatingNodeTaints: enabled: true RemovePodsViolatingTopologySpreadConstraint: enabled: true params: includeSoftConstraints: true namespaces: include: - "back"

After the AZ1 fail and get back the pods spread like this [2pod/1pod/0pod]

The Descheduler has the log

I0816 14:25:41.458657 1 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="back/app-name-7875b946dc-d889d" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"

After evictLocalStoragePods set to true, the pods evicted successfully.

I've one more question. The metrics has pod name of descheduler is it possible to have pod name which was evicted?

In the logs:

I0816 14:25:41.458657 1 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="back/app-name-7875b946dc-d889d" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"

In the metrics:

descheduler_pods_evicted{cloud="cloud", cluster="cluster", container="descheduler", endpoint="http-metrics", instance="10.10.32.135:10258", job="descheduler", namespace="back", node="cl13khna1utejao014uh-oqar", pod="descheduler-64f4ccfdb7-cq5sm", prometheus="prometheus", result="success", service="descheduler", strategy="PodTopologySpread"}

Aug 16 '22 14:08 mczimm

Hi @damemi, can please tell me how it will be comfortable for you to discuss the pod name in the metrics? Within this issue or I should open a new one?

Aug 17 '22 06:08 mczimm

@mczimm let's open another issue for that and close this one as fixed. I don't think that metric will really be the right place for that kind of label, but we can discuss other options /close

Aug 17 '22 14:08 damemi

@damemi: Closing this issue.

In response to this:

@mczimm let's open another issue for that and close this one as fixed. I don't think that metric will really be the right place for that kind of label, but we can discuss other options /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 17 '22 14:08 k8s-ci-robot

descheduler descheduler copied to clipboard

topologyspreadconstraint isn't work as expected

descheduler
descheduler copied to clipboard