descheduler
descheduler copied to clipboard
priority threshold misconfigured, only one of priorityThreshold fields can be set
What version of descheduler are you using?
descheduler version: v0.27.1
Does this issue reproduce with the latest release?
Yes.
Which descheduler CLI options are you using?
Helm Chart defaults:
args
- args:
- --policy-config-file
- /policy-dir/policy.yaml
- --descheduling-interval
- 10m
- --v
- "4"
Please provide a copy of your descheduler policy config file
policy ConfigMap
apiVersion: v1
data:
policy.yaml: |
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: GenericProfile
pluginConfig:
- args:
evictFailedBarePods: true
evictLocalStoragePods: true
nodeFit: true
priorityThreshold:
name: exclude-descheduler
name: DefaultEvictor
- args:
thresholds:
cpu: 20
memory: 20
pods: 20
name: HighNodeUtilization
- args:
targetThresholds:
cpu: 70
memory: 70
pods: 70
thresholds:
cpu: 20
memory: 20
pods: 20
name: LowNodeUtilization
- args:
maxPodLifeTimeSeconds: 7200
states:
- ContainerCreating
- Pending
- PodInitializing
name: PodLifeTime
- args:
excludeOwnerKinds:
- ReplicaSet
name: RemoveDuplicates
- args:
excludeOwnerKinds:
- Job
includingInitContainers: true
minPodLifetimeSeconds: 3600
name: RemoveFailedPods
- args:
includingInitContainers: true
podRestartThreshold: 10
name: RemovePodsHavingTooManyRestarts
- name: RemovePodsViolatingInterPodAntiAffinity
- args:
nodeAffinityType:
- requiredDuringSchedulingIgnoredDuringExecution
name: RemovePodsViolatingNodeAffinity
- name: RemovePodsViolatingNodeTaints
- name: RemovePodsViolatingTopologySpreadConstraint
plugins:
balance:
enabled:
- HighNodeUtilization
- LowNodeUtilization
- RemoveDuplicates
- RemovePodsViolatingTopologySpreadConstraint
deschedule:
enabled:
- PodLifeTime
- RemoveFailedPods
- RemovePodsHavingTooManyRestarts
- RemovePodsViolatingInterPodAntiAffinity
- RemovePodsViolatingNodeAffinity
- RemovePodsViolatingNodeTaints
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: descheduler
meta.helm.sh/release-namespace: kube-system
reloader.stakater.com/match: "true"
creationTimestamp: "2023-08-16T09:22:39Z"
labels:
app.kubernetes.io/instance: descheduler
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: descheduler
app.kubernetes.io/version: 0.27.1
helm.sh/chart: descheduler-0.27.1
helm.toolkit.fluxcd.io/name: descheduler
helm.toolkit.fluxcd.io/namespace: kube-system
name: descheduler
namespace: kube-system
resourceVersion: "467902404"
uid: 600f131d-1515-4ae7-a9ef-1dc0963a247d
What k8s version are you using (kubectl version)?
kubectl version Output
$ kubectl version Client Version: v1.25.7 Kustomize Version: v4.5.7 Server Version: v1.25.11-eks-a5565ad
What did you do?
Provided the ConfigMap with policy above, descheduler fails to start:
NAME READY STATUS RESTARTS AGE
descheduler-644697d794-g5z47 0/1 CrashLoopBackOff 6 (63s ago) 7m14s
Logs show problem with priorityThreshold:
I0816 11:09:33.663355 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692184173\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692184171\" (2023-08-16 10:09:29 +0000 UTC to 2024-08-15 10:09:29 +0000 UTC (now=2023-08-16 11:09:33.663315485 +0000 UTC))"
I0816 11:09:33.663419 1 secure_serving.go:210] Serving securely on [::]:10258
I0816 11:09:33.663508 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0816 11:09:33.664577 1 server.go:99] "descheduler server" err="in profile GenericProfile: priority threshold misconfigured, only one of priorityThreshold fields can be set, got &TypeMeta{Kind:,APIVersion:,}"
I0816 11:09:33.664705 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0816 11:09:33.664777 1 secure_serving.go:255] Stopped listening on [::]:10258
There is no priorityThreshold.value specified in the policy/ConfigMap, yet it argues that only one can be specified. If I remove the name key and replace it with value with some reasonable value, descheduler successfully starts:
I0816 09:33:54.749414 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692178434\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692178432\" (2023-08-16 08:33:50 +0000 UTC to 2024-08-15 08:33:50 +0000 UTC (now=2023-08-16 09:33:54.749372596 +0000 UTC))"
I0816 09:33:54.749494 1 secure_serving.go:210] Serving securely on [::]:10258
I0816 09:33:54.749578 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0816 09:33:54.760784 1 descheduler.go:123] Warning: Convert Kubernetes server minor version to float fail
W0816 09:33:54.760796 1 descheduler.go:127] Warning: Descheduler minor version 27 is not supported on your version of Kubernetes 1.25+. See compatibility docs for more info: https://github.com/kubernetes-sigs/descheduler#compatibility-matrix
I0816 09:33:54.768692 1 reflector.go:287] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768708 1 reflector.go:323] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768925 1 reflector.go:287] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768941 1 reflector.go:323] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769088 1 reflector.go:287] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769102 1 reflector.go:323] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769240 1 reflector.go:287] Starting reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769252 1 reflector.go:323] Listing and watching *v1.PriorityClass from k8s.io/client-go/informers/factory.go:150
I0816 09:33:55.347709 1 shared_informer.go:341] caches populated
I0816 09:33:55.347786 1 shared_informer.go:341] caches populated
I0816 09:33:55.347800 1 shared_informer.go:341] caches populated
I0816 09:33:56.748637 1 shared_informer.go:341] caches populated
I0816 09:33:56.751015 1 descheduler.go:292] Building a pod evictor
I0816 09:33:56.751071 1 defaultevictor.go:76] "Warning: EvictFailedBarePods is set to True. This could cause eviction of pods without ownerReferences."
I0816 09:33:56.751118 1 pod_lifetime.go:109] "Processing node" node="ip-10-254-48-230.ec2.internal"
[...]
What did you expect to see?
Running descheduler pod.
What did you see instead?
descheduler failing to start with error show above.
Hi @mstefany Thank you for all the details!
However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?
I also meet the same sitution as @mstefany
Hi @mstefany Thank you for all the details!
However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?
Nope, there shouldn't be anything additional except what I posted. No multiple profiles, etc. One thing however - I think I don't use the "default" profile name.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.