descheduler
descheduler copied to clipboard
Namespace fillter is not shown as a reason the pod shouldn't be evicted
What version of descheduler are you using?
descheduler version: v0.25.1
Does this issue reproduce with the latest release?
Yes
Which descheduler CLI options are you using?
- args:
- --policy-config-file
- /policy-dir/policy.yaml
- --descheduling-interval
- 5m
- --v
- "4"
command:
- /bin/descheduler
Please provide a copy of your descheduler policy config file
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
evictLocalStoragePods: true
maxNoOfPodsToEvictPerNode: 1
strategies:
"PodLifeTime":
enabled: true
params:
podLifeTime:
maxPodLifeTimeSeconds: 3600
labelSelector:
matchLabels:
app: foo
namespaces:
include:
- foo
What k8s version are you using (kubectl version
)?
kubectl version
Output
$ kubectl versionClient: v1.23 Server: v1.22
What did you do?
Deployed the descheduler with the above policy + use verbosity lv 4.
What did you expect to see?
The default evictor seems to be unaware of the namespace filter thus is doesn't mention it in the checks list.
See: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/defaultevictor/defaultevictor.go#L166
Also, the pod lifetime plugin wouldn't be able to output that information as it filters for the namespace early on: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/podlifetime/pod_lifetime.go#L61
#from the logs
defaultevictor.go:188] "Pod fails the following checks" pod="kube-system/foobar" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, po
d has local storage and descheduler is not configured with evictLocalStoragePods]"
What did you see instead?
Something mentioning the pod wouldn't be evicted due to it's namespace didn't match the filtering requirements.
As you pointed out, we do use the namespace filter early on in most strategies. Ideally, the evictor shouldn't even know about the namespace filters because all the pods from those namespaces were already filtered (though I think this might vary by strategy, ie -- some filter before doing calculations, some do calculations then filter before eviction). Either way, we should be able to log which namespaces are filtered.
I think the best approach for consistency across strategies is that the Pod fails the following checks
line shouldn't even show up for namespaces that aren't included in the filter. The evictor (or strategy) should still log that those namespaces are excluded/included, but it's weird that we're doing all these checks on a pod that is filtered out already.
Hi @damemi , thanks for the swift response.
You've nailed the problem here.
What threw me off here were the many lines mentioning pods that (given the policy namespace filter) shouldn't even be considered. At the same time it might be the right thing to do as when someone is testing a policy, they must know the reason about all pods, so they can actually troubleshoot any issues.
I wonder if just adding the namespace as a check in the default evictor would be enough to make the message more useful or if something else needs to be worked out.
I would be happy to attempt to implement the namespace as a check in the evictor. Let me know if you're happy with this idea so I can craft a PR.
@migueleliasweb sure, thanks for offering to help. There's definitely something to be fixed here. Either we need to update the evictor checks, or ideally these pods should never even make it to the evictor if they're filtered in the ListPodsOnANode stage. Maybe a combination of the two
@damemi I just pushed this https://github.com/kubernetes-sigs/descheduler/pull/976.
Let me know your thoughts. :+1:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.