metrics-server
metrics-server copied to clipboard
kubctl top shows memory usage > 200%
What happened: kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-143-4-57.us-east-2.compute.internal 211m 21% 2838Mi 240% ip-10-143-4-98.us-east-2.compute.internal 2551m 85% 12346Mi 93% ip-10-143-5-15.us-east-2.compute.internal 599m 19% 9894Mi 75% ip-10-143-5-223.us-east-2.compute.internal 204m 20% 2899Mi 245% ip-10-143-6-214.us-east-2.compute.internal 553m 18% 9950Mi 75% ip-10-143-6-243.us-east-2.compute.internal 556m 55% 3100Mi 262%
What you expected to happen:
Nodes use less than 100%
Anything else we need to know?:
Only happening on nodes with "controlplane,etcd" role.
Example node description when memory shows 245%:
Addresses: InternalIP: 10.143.5.223 Capacity: attachable-volumes-aws-ebs: 25 cpu: 2 ephemeral-storage: 31445996Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3831048Ki pods: 110 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 1 ephemeral-storage: 26833146218 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 1209608Ki pods: 110
Environment:
- k8s 1.74
- metrics server version 0.4.1
Name: v1beta1.metrics.k8s.io
Namespace:
Labels:
/kind bug
Hi @joemilacek I believe that the results displayed by kubectl top nodes are correct:
- allocatable memory: 1209608Ki = 1181Mi
- memory usage: 2899Mi --> 2899 / 1181 = 245%
and 245% is exactly what you have:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-143-5-223.us-east-2.compute.internal 204m 20% 2899Mi 245%
For nodes, I believe it's the memory usage as reported by the system-wide node cgroup. So it may include stuff that's not in a pod, IIRC. We just report the information given to us by cadvisor on the node.
There is another open issue to improve documentation of kubectl top and explain how percentages are calculated, so I think we could close this one as a duplicate.
I'd like to add an update on why kubectl shows that memory usage is larger than 100% and how to fix that.
Why is the percentage larger than 100%
kubetcl top node displays by default the ratio node_memory_working_set_bytes / Allocatable memory.
To verify it, please run kubectl get --raw /api/v1/nodes/<NODE_NAME>/proxy/metrics/resource | grep node_memory_working_set_bytes to retrieve node_memory_working_set_bytes and kubectl describe node <NODE_NAME> | grep -i capacity -A 10 to retrieve the allocatable memory.
"Allocatable" is just a logical total memory size based on
[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]. In contrast, the node memory usage collected by Metrics API(metrics.k8s.io) is based on real use constantly on the node host. If the system-reserved or hard-eviction threshold is configured bigger, the MEMORY% can be larger than 100%.
Please refer to this PR for the detailed explanation.
How to fix that
Starting from k8s 1.23, kubectl top node has an option --show-capacity. This option defines how the percentage is computed:
- (Default invocation)
kubectl top node=kubectl top node --show-capacity=false: displaysnode_memory_working_set_bytes / Allocatable memoryand could exceed 100% as explained above. kubectl top node --show-capacity=true: displaysnode_memory_working_set_bytes / Capacity memorywhich is always within 100%.
From the example provided by @joemilacek for the node with 254% of memory node_memory_working_set_bytes = 2899Mi Allocatable memory = 1209608Ki = 1118Mi Capacity memory = 3831048Ki = 3741Mi
kubectl top nodewould show 2899Mi / 1118Mi = 245%kubectl top node --show-capacity=truewould show 2899Mi / 3741Mi = 77%
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten /label good first issue /retitle Document the reason for >100% memory usage
/label good-first-issue
@rexagod: The label(s) /label good-first-issue cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?
In response to this:
/label good-first-issue
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@rexagod I'd like to work on this issue!
/assign @PrimalPimmy Thank you for taking this up! 🙌🏼
@rexagod which file would you like me to document this?
I think the FAQ.md should be the correct place to document this.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.