descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

Namespace fillter is not shown as a reason the pod shouldn't be evicted

Open migueleliasweb opened this issue 2 years ago • 5 comments

What version of descheduler are you using?

descheduler version: v0.25.1

Does this issue reproduce with the latest release?

Yes

Which descheduler CLI options are you using?

  - args:
    - --policy-config-file
    - /policy-dir/policy.yaml
    - --descheduling-interval
    - 5m
    - --v
    - "4"
    command:
    - /bin/descheduler

Please provide a copy of your descheduler policy config file

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
evictLocalStoragePods: true
maxNoOfPodsToEvictPerNode: 1
strategies:
  "PodLifeTime":
    enabled: true
    params:
      podLifeTime:
        maxPodLifeTimeSeconds: 3600
      labelSelector:
        matchLabels:
          app: foo
      namespaces:
        include:
        - foo

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version

Client: v1.23 Server: v1.22

What did you do?

Deployed the descheduler with the above policy + use verbosity lv 4.

What did you expect to see?

The default evictor seems to be unaware of the namespace filter thus is doesn't mention it in the checks list.

See: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/defaultevictor/defaultevictor.go#L166

Also, the pod lifetime plugin wouldn't be able to output that information as it filters for the namespace early on: https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/podlifetime/pod_lifetime.go#L61

#from the logs
defaultevictor.go:188] "Pod fails the following checks" pod="kube-system/foobar" checks="[pod is a DaemonSet pod, pod has system critical priority, pod has higher priority than specified priority class threshold, po
d has local storage and descheduler is not configured with evictLocalStoragePods]"

What did you see instead?

Something mentioning the pod wouldn't be evicted due to it's namespace didn't match the filtering requirements.

migueleliasweb avatar Oct 05 '22 04:10 migueleliasweb

As you pointed out, we do use the namespace filter early on in most strategies. Ideally, the evictor shouldn't even know about the namespace filters because all the pods from those namespaces were already filtered (though I think this might vary by strategy, ie -- some filter before doing calculations, some do calculations then filter before eviction). Either way, we should be able to log which namespaces are filtered.

I think the best approach for consistency across strategies is that the Pod fails the following checks line shouldn't even show up for namespaces that aren't included in the filter. The evictor (or strategy) should still log that those namespaces are excluded/included, but it's weird that we're doing all these checks on a pod that is filtered out already.

damemi avatar Oct 05 '22 14:10 damemi

Hi @damemi , thanks for the swift response.

You've nailed the problem here.

What threw me off here were the many lines mentioning pods that (given the policy namespace filter) shouldn't even be considered. At the same time it might be the right thing to do as when someone is testing a policy, they must know the reason about all pods, so they can actually troubleshoot any issues.

I wonder if just adding the namespace as a check in the default evictor would be enough to make the message more useful or if something else needs to be worked out.

migueleliasweb avatar Oct 05 '22 20:10 migueleliasweb

I would be happy to attempt to implement the namespace as a check in the evictor. Let me know if you're happy with this idea so I can craft a PR.

migueleliasweb avatar Oct 05 '22 20:10 migueleliasweb

@migueleliasweb sure, thanks for offering to help. There's definitely something to be fixed here. Either we need to update the evictor checks, or ideally these pods should never even make it to the evictor if they're filtered in the ListPodsOnANode stage. Maybe a combination of the two

damemi avatar Oct 06 '22 14:10 damemi

@damemi I just pushed this https://github.com/kubernetes-sigs/descheduler/pull/976.

Let me know your thoughts. :+1:

migueleliasweb avatar Oct 07 '22 12:10 migueleliasweb

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 05 '23 12:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 04 '23 13:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 06 '23 13:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 06 '23 13:03 k8s-ci-robot