descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

Add a "Do Not Deschedule" label

Open BryanDollery opened this issue 3 years ago • 3 comments

Is your feature request related to a problem? Please describe.

I have a k8s cluster of small nodes and I'm trying to deploy Sonatype Nexus (OSS version) to it. My own pods are quite small -- all using <500Mi/250m memory and CPU limits. Nexus however is huge by comparison with a 4Gi memory limit and wanting 500m CPU. When I schedule nexus, descheduler notices the memory pressure on its node, notes that other nodes have lots of free memory, and deschedules it. As soon as it is running on this other node for a couple of minutes, descheduler does it again. And again.

The result is that I can't keep nexus alive for more than a couple of minutes.

Describe the solution you'd like The easiest thing for me would be to put a 'do not reschedule' label on the nexus pod, to instruct descheduled to leave this one alone.

Describe alternatives you've considered I know that I can set a priority class to achieve the same thing, but that would be to misuse the priorities available to me (node or cluster critical) What version of descheduler are you using?

descheduler version: Image label: 0.23.1

Additional context I'm using the default yaml

strategies:
  LowNodeUtilization:
    enabled: true
    params:
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 50
          memory: 50
          pods: 50
        thresholds:
          cpu: 20
          memory: 20
          pods: 20
  RemoveDuplicates:
    enabled: true
  RemovePodsViolatingInterPodAntiAffinity:
    enabled: true
  RemovePodsViolatingNodeAffinity:
    enabled: true
    params:
      nodeAffinityType:
      - requiredDuringSchedulingIgnoredDuringExecution
  RemovePodsViolatingNodeTaints:
    enabled: true

BryanDollery avatar May 19 '22 10:05 BryanDollery

Hi @BryanDollery, I understand not wanting to overload your existing priority classes when it's not appropriate. Do you have the ability to create new (possibly non-preempting) priority classes? In the past when this has come up, we've chosen to recommend that because it offers a more well-defined eviction hierarchy that aligns with how the scheduler works. But if you aren't able to do that, it could show a use case for implementing this.

(linking the prior issues just so they're tied together https://github.com/kubernetes-sigs/descheduler/issues/422 https://github.com/kubernetes-sigs/descheduler/issues/329)

damemi avatar May 19 '22 13:05 damemi

Any update on this?

sudughonge avatar Jul 11 '22 21:07 sudughonge

"Don't deschedule if already running" is an orthogonal thing from priority class, I think. We have some deployments which have a defined maintenance window and their pods should only be descheduled during that window. New pods from these deployments should not fill the scheduler queue ahead of system critical pods, yet those system critical pods can be descheduled any time since they are HA and have PDBs protecting their minimum availability.

An annotation to prevent descheduling would neatly solve this, and would also solve things like #423 (Re that issue: I personally think adding time-of-day stuff to descheduler is unnecessary complexity, since it's very simple to make a CronJob that removes a not-evictable annotation at the start of a maintenance window and adds it back at the end. We already do this with cluster-autoscaler's safe-to-evict annotation.)

jbg avatar Aug 28 '22 09:08 jbg

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 26 '22 10:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 26 '22 11:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 25 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 25 '23 11:01 k8s-ci-robot