descheduler
descheduler copied to clipboard
podLifeTime ignoring notready nodes
What version of descheduler are you using?
descheduler version: 0.24.1
Does this issue reproduce with the latest release? Yes
Please provide a copy of your descheduler policy config file
strategies:
PodLifeTime:
enabled: true
params:
podLifeTime:
maxPodLifeTimeSeconds: 600
podStatusPhases:
- "Pending"
What k8s version are you using (kubectl version
)?
$ kubectl version
v1.22.8
What did you do? Sometimes I get into situations where I have a pod that is in PENDING state for hours waiting for one node that it is NotReady.
$ kubectl get pods | grep -i Pending
abc-59f79577d9-n7rln 0/1 Pending 0 26m
Describing that pod shows me:
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/abc-59f79577d9-n7rln to node12
Describing that node shows me:
$ kubectl get nodes | grep node12
NAME STATUS ROLES AGE VERSION
node12 NotReady <none> 15d v1.22.8
What did you expect to see? Descheduler with podLifeTime enable should delete that pod so it would be scheduled in another node.
What did you see instead? podLifeTime ignores pods running on notready nodes. The log does not even show it processing the not ready node.
kubectl logs abc--1-7rjnf | grep -i lifet
I0603 13:27:15.032376 1 pod_lifetime.go:104] "Processing node" node="node11"
I0603 13:27:15.032394 1 pod_lifetime.go:104] "Processing node" node="nodem1"
I0603 13:27:15.032432 1 pod_lifetime.go:104] "Processing node" node="node10"
...
If I manually delete that node it will be scheduled immediately in another node as expected.
This seems like a reasonable use case, and I don't think I've seen it requested before. But I'm also interested in how the pod got scheduled to the NotReady node in the first place. Is the node not marked Unschedulable? This is important because evicting the pod could just cause a scheduling/eviction hotloop where it just lands back on the same node.
From an implementation standpoint, this happens because we calculate one set of ReadyNodes, which are used for both the eviction consideration and the potential placement candidates (even though descheduler doesn't actually re-schedule any pods, it still optimizes to choose pods that do have a potential candidate). So, we would need to do something like separate this into 2 lists, one for eviction and one for placement.
I have no idea how the pod got scheduled there, but that scenario happens at least once or twice a week. I have a few nodes running on a remote location and the link is not very reliable thus those nodes sometimes change to not-ready for a while. Eventually they will get back and the pod will be started, but since sometimes the link can be off for a long time it would be nice if descheduler could speed up that reschedule.
Every single time I have manually deleted the pod it get scheduled on a ready node, so I think it would not create a hotloop.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.