descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

LowNodeUtilization doesn't check nodeSelector/nodeAffinity when choosing pods to evict

Open seleznev opened this issue 2 years ago • 3 comments

What version of descheduler are you using?

descheduler version: 0.24.1

Does this issue reproduce with the latest release?

Yes.

Which descheduler CLI options are you using?

- --policy-config-file
- /policy-dir/policy.yaml
- --v
- "3"

Please provide a copy of your descheduler policy config file

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
evictLocalStoragePods: true
ignorePvcPods: true
maxNoOfPodsToEvictPerNamespace: 1
maxNoOfPodsToEvictPerNode: 1
strategies:
  LowNodeUtilization:
    enabled: true
    params:
      nodeFit: true
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 50
          memory: 50
          pods: 50
        thresholds:
          cpu: 20
          memory: 20
          pods: 20

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.14", GitCommit:"57a3aa3f13699cf3db9c52d228c18db94fa81876", GitTreeState:"clean", BuildDate:"2021-12-15T14:52:33Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.14", GitCommit:"57a3aa3f13699cf3db9c52d228c18db94fa81876", GitTreeState:"clean", BuildDate:"2021-12-15T14:47:10Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I have 4 nodes in the cluster (node1 and node2 are overutilized, node3 are underutilized):

NAME    LABELS
node1   role=worker
node2   role=worker
node3   role=worker2

All pods (except created by DaemonSets) has nodeSelector with role=worker or role=worker2.

Then I run descheduler with config above.

What did you expect to see?

Descheduler do nothing (pods from node1 and node2 doesn't fit on node3 because of nodeSelector).

What did you see instead?

Descheduler evicts pods from node1 and node2 every launch.

seleznev avatar Jul 13 '22 10:07 seleznev

@seleznev thanks for this, could you share the logs from the descheduler showing this eviction? Ideally at v=4 log level. That should give us an idea about why it's doing these evictions.

damemi avatar Jul 13 '22 14:07 damemi

I tried to clean up the cluster, but still there's a lot of trash in the logs, sorry. :( Also, I removed node4 from the description to match logs below.

--v=4 --dry-run
I0714 15:14:14.519845   49426 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1657800854\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1657800854\" (2022-07-14 11:14:14 +0000 UTC to 2023-07-14 11:14:14 +0000 UTC (now=2022-07-14 12:14:14.519807269 +0000 UTC))"
I0714 15:14:14.519913   49426 secure_serving.go:210] Serving securely on [::]:10258
I0714 15:14:14.519953   49426 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0714 15:14:14.972570   49426 reflector.go:219] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:14.972580   49426 reflector.go:219] Starting reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:14.972585   49426 reflector.go:219] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:14.972593   49426 reflector.go:255] Listing and watching *v1.PriorityClass from k8s.io/client-go/informers/factory.go:134
I0714 15:14:14.972597   49426 reflector.go:255] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134
I0714 15:14:14.972589   49426 reflector.go:255] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:134
I0714 15:14:15.072905   49426 shared_informer.go:285] caches populated
I0714 15:14:15.072936   49426 shared_informer.go:285] caches populated
I0714 15:14:15.473308   49426 shared_informer.go:285] caches populated
I0714 15:14:15.473471   49426 node.go:49] "Node lister returned empty list, now fetch directly"
I0714 15:14:15.569834   49426 descheduler.go:253] Building a cached client from the cluster for the dry run
I0714 15:14:15.569875   49426 descheduler.go:120] Pulling resources for the cached client from the cluster
I0714 15:14:15.586582   49426 reflector.go:219] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:15.586597   49426 reflector.go:255] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134
I0714 15:14:15.686657   49426 shared_informer.go:285] caches populated
I0714 15:14:15.686697   49426 descheduler.go:278] Building a pod evictor
I0714 15:14:15.687101   49426 nodeutilization.go:224] "Node is overutilized" node="node1" usage=map[cpu:1619m memory:6520041673 pods:22] usagePercentage=map[cpu:40.576441102756895 memory:86.49085210728913 pods:36.666666666666664]
I0714 15:14:15.687143   49426 nodeutilization.go:224] "Node is overutilized" node="node2" usage=map[cpu:1483m memory:5123585820 pods:28] usagePercentage=map[cpu:37.16791979949875 memory:67.96632991652716 pods:46.666666666666664]
I0714 15:14:15.687171   49426 nodeutilization.go:221] "Node is underutilized" node="node3" usage=map[cpu:251m memory:406994944 pods:3] usagePercentage=map[cpu:6.290726817042606 memory:5.053364510004751 pods:2.727272727272727]
I0714 15:14:15.687196   49426 lownodeutilization.go:118] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0714 15:14:15.687212   49426 lownodeutilization.go:119] "Number of underutilized nodes" totalNumber=1
I0714 15:14:15.687231   49426 lownodeutilization.go:132] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0714 15:14:15.687246   49426 lownodeutilization.go:133] "Number of overutilized nodes" totalNumber=2
I0714 15:14:15.687280   49426 nodeutilization.go:277] "Total capacity to be moved" CPU=1744 Mem=3619975040 Pods=52
I0714 15:14:15.687300   49426 nodeutilization.go:280] "Evicting pods from node" node="node1" usage=map[cpu:1619m memory:6520041673 pods:22]
I0714 15:14:15.687482   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-node-nb5mz" node="node2"
I0714 15:14:15.687504   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.687542   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-node-nb5mz" node="node3"
I0714 15:14:15.687561   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.687598   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/topolvm-node-nb5mz" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.687752   49426 node.go:145] "Pod fits on node" pod="kube-system/vpa-manager-8668966dc5-pz226" node="node2"
I0714 15:14:15.687788   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/vpa-manager-8668966dc5-pz226" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.687911   49426 node.go:145] "Pod fits on node" pod="kube-system/calico-typha-5458d7dc9b-d8dk2" node="node2"
I0714 15:14:15.687956   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/calico-typha-5458d7dc9b-d8dk2" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.688090   49426 node.go:145] "Pod fits on node" pod="kube-system/cluster-autoscaler-74f6bc9c5-zwm87" node="node2"
I0714 15:14:15.688118   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/cluster-autoscaler-74f6bc9c5-zwm87" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.688229   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-thanos-query-7bc94f947b-n28rj" node="node2"
I0714 15:14:15.688258   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-thanos-query-7bc94f947b-n28rj" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.688370   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-lvmd-0-dxl4c" node="node2"
I0714 15:14:15.688387   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.688418   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-lvmd-0-dxl4c" node="node3"
I0714 15:14:15.688432   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.688461   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/topolvm-lvmd-0-dxl4c" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.688583   49426 node.go:148] "Pod does not fit on node" pod="kube-system/unbound-th87x" node="node2"
I0714 15:14:15.688601   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.688643   49426 node.go:148] "Pod does not fit on node" pod="kube-system/unbound-th87x" node="node3"
I0714 15:14:15.688696   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.688728   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/unbound-th87x" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.688840   49426 node.go:145] "Pod fits on node" pod="io/kichay-deploy-test-5565f7b678-2cx4j" node="node2"
I0714 15:14:15.688963   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-resource-redis-5998d76c7-fzlpn" node="node2"
I0714 15:14:15.688992   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-resource-redis-5998d76c7-fzlpn" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.689131   49426 node.go:145] "Pod fits on node" pod="kube-system/vpa-recommender-8554864b8d-lhkkt" node="node2"
I0714 15:14:15.689164   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/vpa-recommender-8554864b8d-lhkkt" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.689285   49426 node.go:145] "Pod fits on node" pod="kube-system/vpa-updater-6b59d8b6df-vvwsp" node="node2"
I0714 15:14:15.689312   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/vpa-updater-6b59d8b6df-vvwsp" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.689422   49426 node.go:145] "Pod fits on node" pod="gitlab/makisu-redis-74c597947f-mzb26" node="node2"
I0714 15:14:15.689446   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="gitlab/makisu-redis-74c597947f-mzb26" checks="pod has a PVC and descheduler is configured to ignore PVC pods"
I0714 15:14:15.689561   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-perf-b65c94486-qfmdj" node="node2"
I0714 15:14:15.689592   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-perf-b65c94486-qfmdj" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.689700   49426 node.go:148] "Pod does not fit on node" pod="kube-system/calico-node-qbcw2" node="node2"
I0714 15:14:15.689717   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.689753   49426 node.go:148] "Pod does not fit on node" pod="kube-system/calico-node-qbcw2" node="node3"
I0714 15:14:15.689768   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.689805   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/calico-node-qbcw2" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.689961   49426 node.go:145] "Pod fits on node" pod="kube-system/csi-rbdplugin-provisioner-8cb6c6b99-5568b" node="node2"
I0714 15:14:15.690072   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-blackbox-9f7dcb4f-zhzhr" node="node2"
I0714 15:14:15.690101   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-blackbox-9f7dcb4f-zhzhr" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.690220   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-cluster-65c76558c4-4jrlp" node="node2"
I0714 15:14:15.690247   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-cluster-65c76558c4-4jrlp" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.690404   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-spilo-5c6899f8d6-qnzq2" node="node2"
I0714 15:14:15.690436   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-spilo-5c6899f8d6-qnzq2" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.690575   49426 node.go:148] "Pod does not fit on node" pod="kube-system/kube-proxy-rsps7" node="node2"
I0714 15:14:15.690595   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.690636   49426 node.go:148] "Pod does not fit on node" pod="kube-system/kube-proxy-rsps7" node="node3"
I0714 15:14:15.690654   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.690689   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-proxy-rsps7" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.690821   49426 node.go:145] "Pod fits on node" pod="io/cassandra-0" node="node2"
I0714 15:14:15.690851   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io/cassandra-0" checks="pod has a PVC and descheduler is configured to ignore PVC pods"
I0714 15:14:15.690980   49426 node.go:148] "Pod does not fit on node" pod="io-logging/elasticsearch-logs-data-1" node="node2"
I0714 15:14:15.691000   49426 node.go:150] "insufficient topolvm.cybozu.com/capacity"
I0714 15:14:15.691047   49426 node.go:148] "Pod does not fit on node" pod="io-logging/elasticsearch-logs-data-1" node="node3"
I0714 15:14:15.691065   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.691083   49426 node.go:150] "insufficient topolvm.cybozu.com/capacity"
I0714 15:14:15.691117   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io-logging/elasticsearch-logs-data-1" checks="[pod has a PVC and descheduler is configured to ignore PVC pods, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.691243   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-state-metrics-555d9dccc-t5ddm" node="node2"
I0714 15:14:15.691279   49426 nodeutilization.go:283] "Pods on node" node="node1" allPods=22 nonRemovablePods=19 removablePods=3
I0714 15:14:15.691304   49426 nodeutilization.go:290] "Evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers"
I0714 15:14:15.691491   49426 evictions.go:161] "Evicted pod in dry run mode" pod="io/kichay-deploy-test-5565f7b678-2cx4j" reason="LowNodeUtilization" strategy="LowNodeUtilization" node="node1"
I0714 15:14:15.691515   49426 nodeutilization.go:323] "Evicted pods" pod="io/kichay-deploy-test-5565f7b678-2cx4j" err=
I0714 15:14:15.691543   49426 nodeutilization.go:348] "Updated node usage" node="node1" CPU=1609 Mem=6503264457 Pods=21
E0714 15:14:15.691599   49426 nodeutilization.go:318] "Error evicting pod" err="Maximum number 1 of evicted pods per \"node1\" node reached" pod="kube-system/csi-rbdplugin-provisioner-8cb6c6b99-5568b"
I0714 15:14:15.691642   49426 nodeutilization.go:294] "Evicted pods from node" node="node1" evictedPods=1 usage=map[cpu:1609m memory:6503264457 pods:21]
I0714 15:14:15.691670   49426 nodeutilization.go:280] "Evicting pods from node" node="node2" usage=map[cpu:1483m memory:5123585820 pods:28]
I0714 15:14:15.691793   49426 node.go:148] "Pod does not fit on node" pod="kube-system/calico-node-49tnp" node="node1"
I0714 15:14:15.691813   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.691861   49426 node.go:148] "Pod does not fit on node" pod="kube-system/calico-node-49tnp" node="node3"
I0714 15:14:15.691882   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.691928   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/calico-node-49tnp" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.692045   49426 node.go:145] "Pod fits on node" pod="kube-system/cluster-autoscaler-74f6bc9c5-gzlhp" node="node1"
I0714 15:14:15.692064   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/cluster-autoscaler-74f6bc9c5-gzlhp" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.692158   49426 node.go:145] "Pod fits on node" pod="kube-system/csi-rbdplugin-provisioner-8cb6c6b99-pk8rf" node="node1"
I0714 15:14:15.692220   49426 node.go:145] "Pod fits on node" pod="io/load-generator-547dd97745-zlmm2" node="node1"
I0714 15:14:15.692290   49426 node.go:145] "Pod fits on node" pod="io-vm/alertmanager-0" node="node1"
I0714 15:14:15.692318   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io-vm/alertmanager-0" checks="pod has a PVC and descheduler is configured to ignore PVC pods"
I0714 15:14:15.692399   49426 node.go:145] "Pod fits on node" pod="kube-system/eventrouter-57b9b4cd47-mwq5n" node="node1"
I0714 15:14:15.692462   49426 node.go:148] "Pod does not fit on node" pod="kube-system/kube-proxy-rpqw7" node="node1"
I0714 15:14:15.692472   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.692493   49426 node.go:148] "Pod does not fit on node" pod="kube-system/kube-proxy-rpqw7" node="node3"
I0714 15:14:15.692502   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.692528   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-proxy-rpqw7" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.692582   49426 node.go:145] "Pod fits on node" pod="io/seleznev-test-job-1657800600-w5dz8" node="node1"
I0714 15:14:15.692645   49426 node.go:145] "Pod fits on node" pod="io/status-prometheus-6876b6f97b-vbss2" node="node1"
I0714 15:14:15.692808   49426 node.go:145] "Pod fits on node" pod="kube-system/prometheus-adapter-657d784c89-snvr9" node="node1"
I0714 15:14:15.692843   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/prometheus-adapter-657d784c89-snvr9" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.692906   49426 node.go:148] "Pod does not fit on node" pod="kube-system/unbound-cdvzn" node="node1"
I0714 15:14:15.692927   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.692963   49426 node.go:148] "Pod does not fit on node" pod="kube-system/unbound-cdvzn" node="node3"
I0714 15:14:15.692977   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.693002   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/unbound-cdvzn" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.693117   49426 node.go:145] "Pod fits on node" pod="kube-system/vpa-exporter-59bfd6d49c-rvjx6" node="node1"
I0714 15:14:15.693148   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/vpa-exporter-59bfd6d49c-rvjx6" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.693243   49426 node.go:145] "Pod fits on node" pod="io/test-postgres-db12-pooler-78c564589d-lrww8" node="node1"
I0714 15:14:15.693324   49426 node.go:145] "Pod fits on node" pod="io-vm/vmagent-kafka-1" node="node1"
I0714 15:14:15.693398   49426 node.go:145] "Pod fits on node" pod="kube-system/cert-manager-legacy-controller-666fbf7899-zp8dr" node="node1"
I0714 15:14:15.693420   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/cert-manager-legacy-controller-666fbf7899-zp8dr" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.693477   49426 node.go:145] "Pod fits on node" pod="sonobuoy-2gis/sonobuoy" node="node1"
I0714 15:14:15.693497   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="sonobuoy-2gis/sonobuoy" checks="pod does not have any ownerRefs"
I0714 15:14:15.693555   49426 node.go:145] "Pod fits on node" pod="io/test-postgres-db12-0" node="node1"
I0714 15:14:15.693574   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io/test-postgres-db12-0" checks="pod has a PVC and descheduler is configured to ignore PVC pods"
I0714 15:14:15.693631   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-lvmd-0-mw97j" node="node1"
I0714 15:14:15.693644   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.693667   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-lvmd-0-mw97j" node="node3"
I0714 15:14:15.693679   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.693700   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/topolvm-lvmd-0-mw97j" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.693758   49426 node.go:145] "Pod fits on node" pod="io/dekhtyarev-nginx-7945ff7886-mls68" node="node1"
I0714 15:14:15.693817   49426 node.go:145] "Pod fits on node" pod="io-logging/redis-logs-1" node="node1"
I0714 15:14:15.693881   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-cadvisor-6b9c9dbc54-plwcb" node="node1"
I0714 15:14:15.693905   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-cadvisor-6b9c9dbc54-plwcb" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.693963   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-apps-5f768b48f4-xxbbl" node="node1"
I0714 15:14:15.693984   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-apps-5f768b48f4-xxbbl" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.694043   49426 node.go:145] "Pod fits on node" pod="kube-system/kube-prometheus-nodes-868cf59454-brsvd" node="node1"
I0714 15:14:15.694064   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/kube-prometheus-nodes-868cf59454-brsvd" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.694119   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-node-ckg5g" node="node1"
I0714 15:14:15.694132   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.694155   49426 node.go:148] "Pod does not fit on node" pod="kube-system/topolvm-node-ckg5g" node="node3"
I0714 15:14:15.694168   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.694188   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/topolvm-node-ckg5g" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.694268   49426 node.go:145] "Pod fits on node" pod="io/io-grafana-staging-0" node="node1"
I0714 15:14:15.694295   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io/io-grafana-staging-0" checks="pod has a PVC and descheduler is configured to ignore PVC pods"
I0714 15:14:15.694348   49426 node.go:148] "Pod does not fit on node" pod="io-logging/elasticsearch-logs-master-0" node="node1"
I0714 15:14:15.694357   49426 node.go:150] "insufficient memory"
I0714 15:14:15.694374   49426 node.go:148] "Pod does not fit on node" pod="io-logging/elasticsearch-logs-master-0" node="node3"
I0714 15:14:15.694383   49426 node.go:150] "pod node selector does not match the node label"
I0714 15:14:15.694398   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="io-logging/elasticsearch-logs-master-0" checks="[pod has a PVC and descheduler is configured to ignore PVC pods, pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable]"
I0714 15:14:15.694447   49426 node.go:145] "Pod fits on node" pod="kube-system/calico-kube-controllers-7f74bfffbf-p8s67" node="node1"
I0714 15:14:15.694462   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/calico-kube-controllers-7f74bfffbf-p8s67" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.694513   49426 node.go:145] "Pod fits on node" pod="kube-system/metrics-server-74bd9c78f5-dbmln" node="node1"
I0714 15:14:15.694528   49426 evictions.go:348] "Pod lacks an eviction annotation and fails the following checks" pod="kube-system/metrics-server-74bd9c78f5-dbmln" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
I0714 15:14:15.694540   49426 nodeutilization.go:283] "Pods on node" node="node2" allPods=28 nonRemovablePods=19 removablePods=9
I0714 15:14:15.694551   49426 nodeutilization.go:290] "Evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers"
I0714 15:14:15.694680   49426 evictions.go:161] "Evicted pod in dry run mode" pod="kube-system/csi-rbdplugin-provisioner-8cb6c6b99-pk8rf" reason="LowNodeUtilization" strategy="LowNodeUtilization" node="node2"
I0714 15:14:15.694691   49426 nodeutilization.go:323] "Evicted pods" pod="kube-system/csi-rbdplugin-provisioner-8cb6c6b99-pk8rf" err=
I0714 15:14:15.694705   49426 nodeutilization.go:348] "Updated node usage" node="node2" CPU=1423 Mem=4922259228 Pods=27
E0714 15:14:15.694723   49426 nodeutilization.go:318] "Error evicting pod" err="Maximum number 1 of evicted pods per \"node2\" node reached" pod="io/load-generator-547dd97745-zlmm2"
I0714 15:14:15.694741   49426 nodeutilization.go:294] "Evicted pods from node" node="node2" evictedPods=1 usage=map[cpu:1423m memory:4922259228 pods:27]
I0714 15:14:15.694754   49426 lownodeutilization.go:184] "Total number of pods evicted" evictedPods=2
I0714 15:14:15.694766   49426 descheduler.go:304] "Number of evicted pods" totalEvicted=2
I0714 15:14:15.694850   49426 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0714 15:14:15.694864   49426 watch.go:183] Stopping fake watcher.
I0714 15:14:15.694889   49426 reflector.go:225] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:15.694897   49426 reflector.go:225] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0714 15:14:15.694936   49426 reflector.go:225] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134

For me it looks like nodeFit checks that pod can be scheduled to any other node (not only on underutilized nodes). So descheduler evicts pods from one overutilized node to another overutilized node over and over again.

seleznev avatar Jul 14 '22 12:07 seleznev

when if the pod will fit any of the given nodes at https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/evictions/evictions.go#L316 the nodes var should contain only the destination nodes, don't you think ?

ljuaneda avatar Aug 08 '22 14:08 ljuaneda

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 06 '22 14:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 06 '22 15:12 k8s-triage-robot

We are still running into this issue from time to time (idling nodes with a different nodeSelector resulting in continuous pod evicting within the whole cluster).

/remove-lifecycle rotten

msw-kialo avatar Dec 13 '22 07:12 msw-kialo

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 13 '23 08:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Apr 12 '23 09:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar May 12 '23 09:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 12 '23 09:05 k8s-ci-robot