descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

Does deschedule support argo rollout?

Open isekiro opened this issue 2 years ago • 1 comments

Does not take effect if the resource type is argo rollout.

isekiro avatar Oct 14 '22 09:10 isekiro

Hi @isekiro, I'm not familiar with argo rollouts so it's not clear to me what the ask here is. Are you trying to deschedule pods that have been rolled out with argo?

damemi avatar Oct 14 '22 12:10 damemi

Hi @isekiro, I'm not familiar with argo rollouts so it's not clear to me what the ask here is. Are you trying to deschedule pods that have been rolled out with argo?

Hi, @damemi, Argo Rollouts is a Kubernetes controller and set of CRDs.Can descheduler schedule pods taken over by argo? https://argoproj.github.io/rollouts/

isekiro avatar Oct 17 '22 02:10 isekiro

Looking at that briefly the Pods look like normal pods, so it should work. Could you provide the kubectl get pod/<name> -o yaml output for one of them to confirm?

One thing is the descheduler might not recognize the Argo Rollout as an owning controller. In that case, it wouldn't evict these pods because it thinks they are bare:

Pods (static or mirrored pods or standalone pods) not part of an ReplicationController, ReplicaSet(Deployment), StatefulSet, or Job are never evicted because these pods won't be recreated.

You can override this by setting the descheduler.alpha.kubernetes.io/evict annotation on the pod, which will ignore all filtering criteria (so use carefully)

damemi avatar Oct 17 '22 18:10 damemi

Looking at that briefly the Pods look like normal pods, so it should work. Could you provide the kubectl get pod/<name> -o yaml output for one of them to confirm?

One thing is the descheduler might not recognize the Argo Rollout as an owning controller. In that case, it wouldn't evict these pods because it thinks they are bare:

Pods (static or mirrored pods or standalone pods) not part of an ReplicationController, ReplicaSet(Deployment), StatefulSet, or Job are never evicted because these pods won't be recreated.

You can override this by setting the descheduler.alpha.kubernetes.io/evict annotation on the pod, which will ignore all filtering criteria (so use carefully)

@damemi Thank you for your reply. The pods managed by argo rollout are normal pods. Looking at the log of the cronjob, the pods cannot be distributed to each node according to the LowNodeUtilization strategy after being evicted.

this is part of the log from descheduler : I1016 04:53:47.960296 1 lownodeutilization.go:332] "Evicted pods from node" node="k8s-worker-107" evictedPods=6 usage=map[cpu:9298m memory:125112Mi pods:104] I1016 04:53:47.960399 1 lownodeutilization.go:318] "Evicting pods from node" node="k8s-worker-102" usage=map[cpu:8429m memory:123140Mi pods:110] I1016 04:53:47.961279 1 lownodeutilization.go:321] "Pods on node" node="k8s-worker-102" allPods=110 nonRemovablePods=110 removablePods=0 I1016 04:53:47.961320 1 lownodeutilization.go:324] "No removable pods on node, try next node" node="k8s-worker-102" I1016 04:53:47.961369 1 lownodeutilization.go:318] "Evicting pods from node" node="k8s-worker-111" usage=map[cpu:9248m memory:122086Mi pods:110] I1016 04:53:47.962010 1 lownodeutilization.go:321] "Pods on node" node="k8s-worker-111" allPods=110 nonRemovablePods=110 removablePods=0 I1016 04:53:47.962056 1 lownodeutilization.go:324] "No removable pods on node, try next node" node="k8s-worker-111" I1016 04:53:47.962201 1 lownodeutilization.go:153] "Total number of pods evicted" evictedPods=29

isekiro avatar Oct 18 '22 07:10 isekiro

It looks like that is coming from the nodeutilization function doing its own pod filtering which I think is just re-using the PodEvictor (@ingvagabund or @knelasevero correct me if I'm wrong)

If that's the case @isekiro, the pods are probably being excluded for one of the normal eviction filtering reasons. Do your pods match any of these?

For our team, we should update this to try to log the reason the pod is filtered similar to how we do when calling EvictPod

damemi avatar Oct 19 '22 18:10 damemi

Yes, it is reusing the PodEvictor's pod filtering. @isekiro would it be possible to share the pod manifest?

ingvagabund avatar Oct 20 '22 11:10 ingvagabund

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 18 '23 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 17 '23 12:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 19 '23 13:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 19 '23 13:03 k8s-ci-robot