descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

priority threshold misconfigured, only one of priorityThreshold fields can be set

Open mstefany opened this issue 2 years ago • 5 comments

What version of descheduler are you using?

descheduler version: v0.27.1

Does this issue reproduce with the latest release?

Yes.

Which descheduler CLI options are you using?

Helm Chart defaults:

args
  - args:
    - --policy-config-file
    - /policy-dir/policy.yaml
    - --descheduling-interval
    - 10m
    - --v
    - "4"

Please provide a copy of your descheduler policy config file

policy ConfigMap
apiVersion: v1
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    profiles:
    - name: GenericProfile
      pluginConfig:
      - args:
          evictFailedBarePods: true
          evictLocalStoragePods: true
          nodeFit: true
          priorityThreshold:
            name: exclude-descheduler
        name: DefaultEvictor
      - args:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
        name: HighNodeUtilization
      - args:
          targetThresholds:
            cpu: 70
            memory: 70
            pods: 70
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
        name: LowNodeUtilization
      - args:
          maxPodLifeTimeSeconds: 7200
          states:
          - ContainerCreating
          - Pending
          - PodInitializing
        name: PodLifeTime
      - args:
          excludeOwnerKinds:
          - ReplicaSet
        name: RemoveDuplicates
      - args:
          excludeOwnerKinds:
          - Job
          includingInitContainers: true
          minPodLifetimeSeconds: 3600
        name: RemoveFailedPods
      - args:
          includingInitContainers: true
          podRestartThreshold: 10
        name: RemovePodsHavingTooManyRestarts
      - name: RemovePodsViolatingInterPodAntiAffinity
      - args:
          nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
        name: RemovePodsViolatingNodeAffinity
      - name: RemovePodsViolatingNodeTaints
      - name: RemovePodsViolatingTopologySpreadConstraint
      plugins:
        balance:
          enabled:
          - HighNodeUtilization
          - LowNodeUtilization
          - RemoveDuplicates
          - RemovePodsViolatingTopologySpreadConstraint
        deschedule:
          enabled:
          - PodLifeTime
          - RemoveFailedPods
          - RemovePodsHavingTooManyRestarts
          - RemovePodsViolatingInterPodAntiAffinity
          - RemovePodsViolatingNodeAffinity
          - RemovePodsViolatingNodeTaints
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: descheduler
    meta.helm.sh/release-namespace: kube-system
    reloader.stakater.com/match: "true"
  creationTimestamp: "2023-08-16T09:22:39Z"
  labels:
    app.kubernetes.io/instance: descheduler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: descheduler
    app.kubernetes.io/version: 0.27.1
    helm.sh/chart: descheduler-0.27.1
    helm.toolkit.fluxcd.io/name: descheduler
    helm.toolkit.fluxcd.io/namespace: kube-system
  name: descheduler
  namespace: kube-system
  resourceVersion: "467902404"
  uid: 600f131d-1515-4ae7-a9ef-1dc0963a247d

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: v1.25.7
Kustomize Version: v4.5.7
Server Version: v1.25.11-eks-a5565ad

What did you do?

Provided the ConfigMap with policy above, descheduler fails to start:

NAME                           READY   STATUS             RESTARTS      AGE
descheduler-644697d794-g5z47   0/1     CrashLoopBackOff   6 (63s ago)   7m14s

Logs show problem with priorityThreshold:

I0816 11:09:33.663355       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692184173\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692184171\" (2023-08-16 10:09:29 +0000 UTC to 2024-08-15 10:09:29 +0000 UTC (now=2023-08-16 11:09:33.663315485 +0000 UTC))"
I0816 11:09:33.663419       1 secure_serving.go:210] Serving securely on [::]:10258
I0816 11:09:33.663508       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0816 11:09:33.664577       1 server.go:99] "descheduler server" err="in profile GenericProfile: priority threshold misconfigured, only one of priorityThreshold fields can be set, got &TypeMeta{Kind:,APIVersion:,}"
I0816 11:09:33.664705       1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0816 11:09:33.664777       1 secure_serving.go:255] Stopped listening on [::]:10258

There is no priorityThreshold.value specified in the policy/ConfigMap, yet it argues that only one can be specified. If I remove the name key and replace it with value with some reasonable value, descheduler successfully starts:

I0816 09:33:54.749414       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692178434\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692178432\" (2023-08-16 08:33:50 +0000 UTC to 2024-08-15 08:33:50 +0000 UTC (now=2023-08-16 09:33:54.749372596 +0000 UTC))"
I0816 09:33:54.749494       1 secure_serving.go:210] Serving securely on [::]:10258
I0816 09:33:54.749578       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0816 09:33:54.760784       1 descheduler.go:123] Warning: Convert Kubernetes server minor version to float fail
W0816 09:33:54.760796       1 descheduler.go:127] Warning: Descheduler minor version 27 is not supported on your version of Kubernetes 1.25+. See compatibility docs for more info: https://github.com/kubernetes-sigs/descheduler#compatibility-matrix
I0816 09:33:54.768692       1 reflector.go:287] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768708       1 reflector.go:323] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768925       1 reflector.go:287] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768941       1 reflector.go:323] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769088       1 reflector.go:287] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769102       1 reflector.go:323] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769240       1 reflector.go:287] Starting reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769252       1 reflector.go:323] Listing and watching *v1.PriorityClass from k8s.io/client-go/informers/factory.go:150
I0816 09:33:55.347709       1 shared_informer.go:341] caches populated
I0816 09:33:55.347786       1 shared_informer.go:341] caches populated
I0816 09:33:55.347800       1 shared_informer.go:341] caches populated
I0816 09:33:56.748637       1 shared_informer.go:341] caches populated
I0816 09:33:56.751015       1 descheduler.go:292] Building a pod evictor
I0816 09:33:56.751071       1 defaultevictor.go:76] "Warning: EvictFailedBarePods is set to True. This could cause eviction of pods without ownerReferences."
I0816 09:33:56.751118       1 pod_lifetime.go:109] "Processing node" node="ip-10-254-48-230.ec2.internal"
[...]

What did you expect to see?

Running descheduler pod.

What did you see instead?

descheduler failing to start with error show above.

mstefany avatar Aug 16 '23 11:08 mstefany

Hi @mstefany Thank you for all the details!

However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?

a7i avatar Sep 07 '23 15:09 a7i

I also meet the same sitution as @mstefany

gj199575 avatar Sep 19 '23 11:09 gj199575

Hi @mstefany Thank you for all the details!

However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?

Nope, there shouldn't be anything additional except what I posted. No multiple profiles, etc. One thing however - I think I don't use the "default" profile name.

mstefany avatar Sep 19 '23 11:09 mstefany

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 28 '24 15:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 27 '24 16:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 28 '24 17:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 28 '24 17:03 k8s-ci-robot