Improve podEvictor statistics
As suggested in https://github.com/kubernetes-sigs/descheduler/issues/501#issuecomment-781967812, it would be nice to improve the pod evictor type to report eviction statistics for individual strategies. Some suggestions were:
Number of evicted pods (in this strategy): XX Number of evicted pods in this run: XX Total number of evicted pods in all strategies: XX
x-ref this could also be reported as Prometheus metrics (https://github.com/kubernetes-sigs/descheduler/issues/348)
I prefer to report the statistics in metrics. So we don't have to cumulative much in the pod evictor itself.
I am just suggesting that, since those metrics will have to be calculated somewhere, doing it in podEvictor makes sense because it already has access to the information. Metrics can then use the podEvictor instance to report them when requested.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
@damemi I would be happy to contribute to this. Any docs highlighting the decision made to-date?
@a7i nothing concrete, though if you would like to put some ideas together and share a doc that would be a great place to start the discussion. Right now we have 1 metric pods_evicted that's reported by the PodEvictor after a run.
As suggested above, it would be good to have some similar reports on a per-strategy basis. From there we could probably even come up with some additional meta metrics that are specific to the different strategies themselves.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen - Mark this issue or PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen- Mark this issue or PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@ingvagabund: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten
I'd like to work on this issue if someone is not working on it already 🙂 @damemi @ingvagabund
Not aware of anyone working on this atm. Although, this requires some design and probably starting some discussion (e.g. in a google doc). @damemi wdyt?
Yeah I think we have some good patterns started in the code for metrics reporting already that could be fleshed out more. @pravarag feel free to take this on if you'd like
It would be great to have the following:
- pods evicted success
- pods evicted failed
- pods skipped
- total pods under consideration
Overall and per strategy.
/assign
@damemi @ingvagabund I'm trying to replicate the eviction of pods in a local cluster for better understanding of the way statistics are currently being represented. I have a 3 node cluster with the resources not that much heavily utilized for the three nodes cluster, the stats stand here:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
10.177.140.38 161m 4% 3460Mi 26%
10.208.40.245 182m 4% 3849Mi 28%
10.74.193.204 149m 3% 4002Mi 30%
And here are the logs from descheduler pod,
-> k logs descheduler-7bdbc8f9b7-d9r46 -nkube-system
I0920 14:27:38.995798 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1632148058\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1632148058\" (2021-09-20 13:27:38 +0000 UTC to 2022-09-20 13:27:38 +0000 UTC (now=2021-09-20 14:27:38.995739889 +0000 UTC))"
I0920 14:27:38.995912 1 secure_serving.go:195] Serving securely on [::]:10258
I0920 14:27:38.996045 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0920 14:27:40.554774 1 node.go:46] "Node lister returned empty list, now fetch directly"
I0920 14:27:40.973812 1 duplicates.go:99] "Processing node" node="10.177.140.38"
I0920 14:27:41.225473 1 duplicates.go:99] "Processing node" node="10.208.40.245"
I0920 14:27:41.500405 1 duplicates.go:99] "Processing node" node="10.74.193.204"
I0920 14:27:41.717340 1 pod_antiaffinity.go:81] "Processing node" node="10.177.140.38"
I0920 14:27:41.823705 1 pod_antiaffinity.go:81] "Processing node" node="10.208.40.245"
I0920 14:27:41.879063 1 pod_antiaffinity.go:81] "Processing node" node="10.74.193.204"
I0920 14:27:42.198284 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.177.140.38" usage=map[cpu:1172m memory:1327634Ki pods:20] usagePercentage=map[cpu:29.974424552429667 memory:9.74255252448638 pods:18.181818181818183]
I0920 14:27:42.198333 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.208.40.245" usage=map[cpu:1044m memory:1137170Ki pods:12] usagePercentage=map[cpu:26.70076726342711 memory:8.344874004635447 pods:10.909090909090908]
I0920 14:27:42.198354 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.74.193.204" usage=map[cpu:1355m memory:1552914Ki pods:15] usagePercentage=map[cpu:34.65473145780051 memory:11.395720666245547 pods:13.636363636363637]
I0920 14:27:42.198369 1 lownodeutilization.go:100] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0920 14:27:42.198380 1 lownodeutilization.go:101] "Number of underutilized nodes" totalNumber=0
I0920 14:27:42.198392 1 lownodeutilization.go:114] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0920 14:27:42.198403 1 lownodeutilization.go:115] "Number of overutilized nodes" totalNumber=0
I0920 14:27:42.198415 1 lownodeutilization.go:118] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0920 14:27:42.198439 1 descheduler.go:152] "Number of evicted pods" totalEvicted=0
I0920 14:32:42.198973 1 node.go:46] "Node lister returned empty list, now fetch directly"
I0920 14:32:42.261831 1 pod_antiaffinity.go:81] "Processing node" node="10.177.140.38"
I0920 14:32:42.295166 1 pod_antiaffinity.go:81] "Processing node" node="10.208.40.245"
I0920 14:32:42.336749 1 pod_antiaffinity.go:81] "Processing node" node="10.74.193.204"
I0920 14:32:42.479844 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.177.140.38" usage=map[cpu:1172m memory:1327634Ki pods:20] usagePercentage=map[cpu:29.974424552429667 memory:9.74255252448638 pods:18.181818181818183]
I0920 14:32:42.479892 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.208.40.245" usage=map[cpu:1044m memory:1137170Ki pods:12] usagePercentage=map[cpu:26.70076726342711 memory:8.344874004635447 pods:10.909090909090908]
I0920 14:32:42.479914 1 nodeutilization.go:170] "Node is appropriately utilized" node="10.74.193.204" usage=map[cpu:1355m memory:1552914Ki pods:15] usagePercentage=map[cpu:34.65473145780051 memory:11.395720666245547 pods:13.636363636363637]
I0920 14:32:42.479930 1 lownodeutilization.go:100] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0920 14:32:42.479941 1 lownodeutilization.go:101] "Number of underutilized nodes" totalNumber=0
I0920 14:32:42.479953 1 lownodeutilization.go:114] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0920 14:32:42.479963 1 lownodeutilization.go:115] "Number of overutilized nodes" totalNumber=0
I0920 14:32:42.479982 1 lownodeutilization.go:118] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0920 14:32:42.480009 1 duplicates.go:99] "Processing node" node="10.177.140.38"
I0920 14:32:42.516420 1 duplicates.go:99] "Processing node" node="10.208.40.245"
I0920 14:32:42.549396 1 duplicates.go:99] "Processing node" node="10.74.193.204"
I0920 14:32:42.595868 1 descheduler.go:152] "Number of evicted pods" totalEvicted=0
I wanted to check if I decrease the threshold values to 10 here, will that be a good way to replicate the pod evictions so that I can look at the current statistics log?
@pravarag in your logs you don't have any underutilized nodes, so lowering threshold won't help (there is already no nodes with all 3 below the set values). Instead, you want to raise the threshold values, so anything with usage under those values will be underutilized.
You also don't have any overutilized nodes, so you should lower the targetThresholds as well. For replicating evictions, cordoning certain nodes while you create test pods will help create the uneven distribution you want.
Thanks @damemi for the above suggestions. Also had few doubts about adding newer metrics. I've identified the changes will mainly take place in these files:
- metrics.go - which will mainly include newer metrics that we want to put.
- evictions.go - where the calculation of newer metrics will happen just like for
pods_evicted
Now, do we also want to modify the logging w.r.t the new metrics that are to be added? Something to include in every strategy like this log?
And one more question: I could see that for the metric pods_evicted, the help says that we can calculate number of pods evicted per strategy and namespace as well. And I'm guessing the code for calculation needs to be added so, do we need an extra metric per strategy like pods_evicted_per_strategy ?
So far, I'm working on adding few new metrics like, pods_evicted_success, pods_evicted_failed, pods_skipped.
@pravarag I think that all sounds good, except we probably don't need a new log line for every new metric. An extra metric per strategy would be good to get pods_evicted_per_strategy, but you can probably just make that 1 metric with different labels for each strategy (that might be what you meant though)
@damemi, I've created a draft pull request while I continue to make few more changes for the same.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
It would be great to have the following:
* pods evicted success * pods evicted failed * pods skipped * total pods under considerationOverall and per strategy.
We would like to help to implement the following above metrics! @eminaktas
There is closed PR https://github.com/kubernetes-sigs/descheduler/pull/648 that submitted by @pravarag And I think it closed in favor of new descheculing framework as also discussed in https://github.com/kubernetes-sigs/descheduler/issues/753#issuecomment-1150133689.
How should we help/proceed here to implement these metrics? Are there any example to create a plugin from scratch using the new descheculing framework?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Kind ping @pravarag 🤞 Any ongoing work on this?
it sounds like we've come back around to not needing single-run metrics. I think per-strategy metrics could be a good option though. ^1
I think this is a really cool idea! Any implementation idea on your mind? Are we supposed to create metrics per strategy? Like pods_evicted_STRATEGY etc.
Solution 1: Without duplication (use map)
func registerForStrategy(strategy string) {
metric := metrics.NewCounterVec(
&metrics.CounterOpts{
Subsystem: DeschedulerSubsystem,
Name: fmt.Sprintf("pods_evicted_%s", strategy),
Help: "Number of evicted pods, by the result, by the namespace, by the node name. 'error' result means a pod could not be evicted",
StabilityLevel: metrics.ALPHA,
}, []string{"result", "namespace", "node"})
podsEvicted[strategy] = metric
legacyregistry.MustRegister(metric)
}
Solution 2: With duplication (create CounterVec for each strategy)
PodsEvictedHighNodeUtilization = metrics.NewCounterVec
PodsEvictedLowNodeUtilization = metrics.NewCounterVec
PodsEvictedPodLifeTime = metrics.NewCounterVec
...
Waiting your thoughts!