descheduler Add a "Do Not Deschedule" label

Is your feature request related to a problem? Please describe.

I have a k8s cluster of small nodes and I'm trying to deploy Sonatype Nexus (OSS version) to it. My own pods are quite small -- all using <500Mi/250m memory and CPU limits. Nexus however is huge by comparison with a 4Gi memory limit and wanting 500m CPU. When I schedule nexus, descheduler notices the memory pressure on its node, notes that other nodes have lots of free memory, and deschedules it. As soon as it is running on this other node for a couple of minutes, descheduler does it again. And again.

The result is that I can't keep nexus alive for more than a couple of minutes.

Describe the solution you'd like The easiest thing for me would be to put a 'do not reschedule' label on the nexus pod, to instruct descheduled to leave this one alone.

Describe alternatives you've considered I know that I can set a priority class to achieve the same thing, but that would be to misuse the priorities available to me (node or cluster critical) What version of descheduler are you using?

descheduler version: Image label: 0.23.1

Additional context I'm using the default yaml

strategies:
  LowNodeUtilization:
    enabled: true
    params:
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 50
          memory: 50
          pods: 50
        thresholds:
          cpu: 20
          memory: 20
          pods: 20
  RemoveDuplicates:
    enabled: true
  RemovePodsViolatingInterPodAntiAffinity:
    enabled: true
  RemovePodsViolatingNodeAffinity:
    enabled: true
    params:
      nodeAffinityType:
      - requiredDuringSchedulingIgnoredDuringExecution
  RemovePodsViolatingNodeTaints:
    enabled: true

May 19 '22 10:05 BryanDollery

Hi @BryanDollery, I understand not wanting to overload your existing priority classes when it's not appropriate. Do you have the ability to create new (possibly non-preempting) priority classes? In the past when this has come up, we've chosen to recommend that because it offers a more well-defined eviction hierarchy that aligns with how the scheduler works. But if you aren't able to do that, it could show a use case for implementing this.

(linking the prior issues just so they're tied together https://github.com/kubernetes-sigs/descheduler/issues/422 https://github.com/kubernetes-sigs/descheduler/issues/329)

May 19 '22 13:05 damemi

Any update on this?

Jul 11 '22 21:07 sudughonge

"Don't deschedule if already running" is an orthogonal thing from priority class, I think. We have some deployments which have a defined maintenance window and their pods should only be descheduled during that window. New pods from these deployments should not fill the scheduler queue ahead of system critical pods, yet those system critical pods can be descheduled any time since they are HA and have PDBs protecting their minimum availability.

An annotation to prevent descheduling would neatly solve this, and would also solve things like #423 (Re that issue: I personally think adding time-of-day stuff to descheduler is unnecessary complexity, since it's very simple to make a CronJob that removes a not-evictable annotation at the start of a maintenance window and adds it back at the end. We already do this with cluster-autoscaler's safe-to-evict annotation.)

Aug 28 '22 09:08 jbg

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 26 '22 10:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 26 '22 11:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 25 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 25 '23 11:01 k8s-ci-robot