descheduler Improve podEvictor statistics

As suggested in https://github.com/kubernetes-sigs/descheduler/issues/501#issuecomment-781967812, it would be nice to improve the pod evictor type to report eviction statistics for individual strategies. Some suggestions were:

Number of evicted pods (in this strategy): XX
Number of evicted pods in this run: XX
Total number of evicted pods in all strategies: XX

x-ref this could also be reported as Prometheus metrics (https://github.com/kubernetes-sigs/descheduler/issues/348)

Feb 19 '21 21:02 damemi

I prefer to report the statistics in metrics. So we don't have to cumulative much in the pod evictor itself.

Feb 22 '21 11:02 ingvagabund

I am just suggesting that, since those metrics will have to be calculated somewhere, doing it in podEvictor makes sense because it already has access to the information. Metrics can then use the podEvictor instance to report them when requested.

Feb 22 '21 13:02 damemi

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

May 23 '21 13:05 fejta-bot

@damemi I would be happy to contribute to this. Any docs highlighting the decision made to-date?

Jun 15 '21 20:06 a7i

@a7i nothing concrete, though if you would like to put some ideas together and share a doc that would be a great place to start the discussion. Right now we have 1 metric pods_evicted that's reported by the PodEvictor after a run.

As suggested above, it would be good to have some similar reports on a per-strategy basis. From there we could probably even come up with some additional meta metrics that are specific to the different strategies themselves.

Jun 15 '21 20:06 damemi

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

Jul 17 '21 02:07 fejta-bot

/remove-lifecycle rotten

Jul 30 '21 18:07 damemi

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Aug 29 '21 18:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 29 '21 18:08 k8s-ci-robot

/reopen

Aug 30 '21 09:08 ingvagabund

@ingvagabund: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 30 '21 09:08 k8s-ci-robot

/remove-lifecycle rotten

Aug 30 '21 09:08 ingvagabund

I'd like to work on this issue if someone is not working on it already 🙂 @damemi @ingvagabund

Sep 14 '21 08:09 pravarag

Not aware of anyone working on this atm. Although, this requires some design and probably starting some discussion (e.g. in a google doc). @damemi wdyt?

Sep 14 '21 09:09 ingvagabund

Yeah I think we have some good patterns started in the code for metrics reporting already that could be fleshed out more. @pravarag feel free to take this on if you'd like

Sep 14 '21 18:09 damemi

It would be great to have the following:

pods evicted success
pods evicted failed
pods skipped
total pods under consideration

Overall and per strategy.

Sep 15 '21 02:09 a7i

/assign

Sep 15 '21 05:09 pravarag

@damemi @ingvagabund I'm trying to replicate the eviction of pods in a local cluster for better understanding of the way statistics are currently being represented. I have a 3 node cluster with the resources not that much heavily utilized for the three nodes cluster, the stats stand here:

NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
10.177.140.38   161m         4%     3460Mi          26%
10.208.40.245   182m         4%     3849Mi          28%
10.74.193.204   149m         3%     4002Mi          30%

And here are the logs from descheduler pod,

->  k logs descheduler-7bdbc8f9b7-d9r46 -nkube-system
I0920 14:27:38.995798       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1632148058\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1632148058\" (2021-09-20 13:27:38 +0000 UTC to 2022-09-20 13:27:38 +0000 UTC (now=2021-09-20 14:27:38.995739889 +0000 UTC))"
I0920 14:27:38.995912       1 secure_serving.go:195] Serving securely on [::]:10258
I0920 14:27:38.996045       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0920 14:27:40.554774       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0920 14:27:40.973812       1 duplicates.go:99] "Processing node" node="10.177.140.38"
I0920 14:27:41.225473       1 duplicates.go:99] "Processing node" node="10.208.40.245"
I0920 14:27:41.500405       1 duplicates.go:99] "Processing node" node="10.74.193.204"
I0920 14:27:41.717340       1 pod_antiaffinity.go:81] "Processing node" node="10.177.140.38"
I0920 14:27:41.823705       1 pod_antiaffinity.go:81] "Processing node" node="10.208.40.245"
I0920 14:27:41.879063       1 pod_antiaffinity.go:81] "Processing node" node="10.74.193.204"
I0920 14:27:42.198284       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.177.140.38" usage=map[cpu:1172m memory:1327634Ki pods:20] usagePercentage=map[cpu:29.974424552429667 memory:9.74255252448638 pods:18.181818181818183]
I0920 14:27:42.198333       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.208.40.245" usage=map[cpu:1044m memory:1137170Ki pods:12] usagePercentage=map[cpu:26.70076726342711 memory:8.344874004635447 pods:10.909090909090908]
I0920 14:27:42.198354       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.74.193.204" usage=map[cpu:1355m memory:1552914Ki pods:15] usagePercentage=map[cpu:34.65473145780051 memory:11.395720666245547 pods:13.636363636363637]
I0920 14:27:42.198369       1 lownodeutilization.go:100] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0920 14:27:42.198380       1 lownodeutilization.go:101] "Number of underutilized nodes" totalNumber=0
I0920 14:27:42.198392       1 lownodeutilization.go:114] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0920 14:27:42.198403       1 lownodeutilization.go:115] "Number of overutilized nodes" totalNumber=0
I0920 14:27:42.198415       1 lownodeutilization.go:118] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0920 14:27:42.198439       1 descheduler.go:152] "Number of evicted pods" totalEvicted=0
I0920 14:32:42.198973       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0920 14:32:42.261831       1 pod_antiaffinity.go:81] "Processing node" node="10.177.140.38"
I0920 14:32:42.295166       1 pod_antiaffinity.go:81] "Processing node" node="10.208.40.245"
I0920 14:32:42.336749       1 pod_antiaffinity.go:81] "Processing node" node="10.74.193.204"
I0920 14:32:42.479844       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.177.140.38" usage=map[cpu:1172m memory:1327634Ki pods:20] usagePercentage=map[cpu:29.974424552429667 memory:9.74255252448638 pods:18.181818181818183]
I0920 14:32:42.479892       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.208.40.245" usage=map[cpu:1044m memory:1137170Ki pods:12] usagePercentage=map[cpu:26.70076726342711 memory:8.344874004635447 pods:10.909090909090908]
I0920 14:32:42.479914       1 nodeutilization.go:170] "Node is appropriately utilized" node="10.74.193.204" usage=map[cpu:1355m memory:1552914Ki pods:15] usagePercentage=map[cpu:34.65473145780051 memory:11.395720666245547 pods:13.636363636363637]
I0920 14:32:42.479930       1 lownodeutilization.go:100] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0920 14:32:42.479941       1 lownodeutilization.go:101] "Number of underutilized nodes" totalNumber=0
I0920 14:32:42.479953       1 lownodeutilization.go:114] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0920 14:32:42.479963       1 lownodeutilization.go:115] "Number of overutilized nodes" totalNumber=0
I0920 14:32:42.479982       1 lownodeutilization.go:118] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0920 14:32:42.480009       1 duplicates.go:99] "Processing node" node="10.177.140.38"
I0920 14:32:42.516420       1 duplicates.go:99] "Processing node" node="10.208.40.245"
I0920 14:32:42.549396       1 duplicates.go:99] "Processing node" node="10.74.193.204"
I0920 14:32:42.595868       1 descheduler.go:152] "Number of evicted pods" totalEvicted=0

I wanted to check if I decrease the threshold values to 10 here, will that be a good way to replicate the pod evictions so that I can look at the current statistics log?

Sep 20 '21 14:09 pravarag

@pravarag in your logs you don't have any underutilized nodes, so lowering threshold won't help (there is already no nodes with all 3 below the set values). Instead, you want to raise the threshold values, so anything with usage under those values will be underutilized.

You also don't have any overutilized nodes, so you should lower the targetThresholds as well. For replicating evictions, cordoning certain nodes while you create test pods will help create the uneven distribution you want.

Sep 27 '21 13:09 damemi

Thanks @damemi for the above suggestions. Also had few doubts about adding newer metrics. I've identified the changes will mainly take place in these files:

metrics.go - which will mainly include newer metrics that we want to put.
evictions.go - where the calculation of newer metrics will happen just like for pods_evicted

Now, do we also want to modify the logging w.r.t the new metrics that are to be added? Something to include in every strategy like this log?

And one more question: I could see that for the metric pods_evicted, the help says that we can calculate number of pods evicted per strategy and namespace as well. And I'm guessing the code for calculation needs to be added so, do we need an extra metric per strategy like pods_evicted_per_strategy ?

So far, I'm working on adding few new metrics like, pods_evicted_success, pods_evicted_failed, pods_skipped.

Sep 30 '21 11:09 pravarag

@pravarag I think that all sounds good, except we probably don't need a new log line for every new metric. An extra metric per strategy would be good to get pods_evicted_per_strategy, but you can probably just make that 1 metric with different labels for each strategy (that might be what you meant though)

Oct 05 '21 17:10 damemi

@damemi, I've created a draft pull request while I continue to make few more changes for the same.

Oct 17 '21 12:10 pravarag

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 15 '22 12:01 k8s-triage-robot

/remove-lifecycle stale

Jan 15 '22 12:01 pravarag

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 15 '22 12:04 k8s-triage-robot

/remove-lifecycle stale

Apr 19 '22 12:04 pravarag

It would be great to have the following:
* pods evicted success

* pods evicted failed

* pods skipped

* total pods under consideration
Overall and per strategy.

We would like to help to implement the following above metrics! @eminaktas

There is closed PR https://github.com/kubernetes-sigs/descheduler/pull/648 that submitted by @pravarag And I think it closed in favor of new descheculing framework as also discussed in https://github.com/kubernetes-sigs/descheduler/issues/753#issuecomment-1150133689.

How should we help/proceed here to implement these metrics? Are there any example to create a plugin from scratch using the new descheculing framework?

Jun 16 '22 10:06 Dentrax

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 14 '22 11:09 k8s-triage-robot

/remove-lifecycle stale

Sep 17 '22 19:09 Dentrax

Kind ping @pravarag 🤞 Any ongoing work on this?

it sounds like we've come back around to not needing single-run metrics. I think per-strategy metrics could be a good option though. ^1

I think this is a really cool idea! Any implementation idea on your mind? Are we supposed to create metrics per strategy? Like pods_evicted_STRATEGY etc.

Solution 1: Without duplication (use map)

func registerForStrategy(strategy string) {
	metric := metrics.NewCounterVec(
		&metrics.CounterOpts{
			Subsystem:      DeschedulerSubsystem,
			Name:           fmt.Sprintf("pods_evicted_%s", strategy),
			Help:           "Number of evicted pods, by the result, by the namespace, by the node name. 'error' result means a pod could not be evicted",
			StabilityLevel: metrics.ALPHA,
		}, []string{"result", "namespace", "node"})

    podsEvicted[strategy] = metric

	legacyregistry.MustRegister(metric)
}

Solution 2: With duplication (create CounterVec for each strategy)

PodsEvictedHighNodeUtilization = metrics.NewCounterVec
PodsEvictedLowNodeUtilization = metrics.NewCounterVec
PodsEvictedPodLifeTime = metrics.NewCounterVec
...

Waiting your thoughts!

Sep 17 '22 21:09 Dentrax