metrics-server icon indicating copy to clipboard operation
metrics-server copied to clipboard

kubctl top shows memory usage > 200%

Open joemilacek opened this issue 3 years ago • 2 comments

What happened: kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-143-4-57.us-east-2.compute.internal 211m 21% 2838Mi 240% ip-10-143-4-98.us-east-2.compute.internal 2551m 85% 12346Mi 93% ip-10-143-5-15.us-east-2.compute.internal 599m 19% 9894Mi 75% ip-10-143-5-223.us-east-2.compute.internal 204m 20% 2899Mi 245% ip-10-143-6-214.us-east-2.compute.internal 553m 18% 9950Mi 75% ip-10-143-6-243.us-east-2.compute.internal 556m 55% 3100Mi 262%

What you expected to happen:

Nodes use less than 100%

Anything else we need to know?:

Only happening on nodes with "controlplane,etcd" role.

Example node description when memory shows 245%:

Addresses: InternalIP: 10.143.5.223 Capacity: attachable-volumes-aws-ebs: 25 cpu: 2 ephemeral-storage: 31445996Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3831048Ki pods: 110 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 1 ephemeral-storage: 26833146218 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 1209608Ki pods: 110

Environment:

  • k8s 1.74
  • metrics server version 0.4.1

Name: v1beta1.metrics.k8s.io Namespace: Labels: Annotations: API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2021-05-05T17:59:58Z Resource Version: 148926115 Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io UID: 3573187b-a3a9-4b41-952b-20fb8745fe5c Spec: Group: metrics.k8s.io Group Priority Minimum: 100 Insecure Skip TLS Verify: true Service: Name: metrics-server Namespace: kube-system Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2022-07-15T16:21:47Z Message: all checks passed Reason: Passed Status: True Type: Available Events:

/kind bug

joemilacek avatar Jul 18 '22 13:07 joemilacek

Hi @joemilacek I believe that the results displayed by kubectl top nodes are correct:

  • allocatable memory: 1209608Ki = 1181Mi
  • memory usage: 2899Mi --> 2899 / 1181 = 245%

and 245% is exactly what you have:

kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-143-5-223.us-east-2.compute.internal 204m 20% 2899Mi 245%

Here you have more details:

For nodes, I believe it's the memory usage as reported by the system-wide node cgroup. So it may include stuff that's not in a pod, IIRC. We just report the information given to us by cadvisor on the node.

There is another open issue to improve documentation of kubectl top and explain how percentages are calculated, so I think we could close this one as a duplicate.

tkrishtop avatar Jul 19 '22 22:07 tkrishtop

I'd like to add an update on why kubectl shows that memory usage is larger than 100% and how to fix that.

Why is the percentage larger than 100%

kubetcl top node displays by default the ratio node_memory_working_set_bytes / Allocatable memory.

To verify it, please run kubectl get --raw /api/v1/nodes/<NODE_NAME>/proxy/metrics/resource | grep node_memory_working_set_bytes to retrieve node_memory_working_set_bytes and kubectl describe node <NODE_NAME> | grep -i capacity -A 10 to retrieve the allocatable memory.

"Allocatable" is just a logical total memory size based on [Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]. In contrast, the node memory usage collected by Metrics API(metrics.k8s.io) is based on real use constantly on the node host. If the system-reserved or hard-eviction threshold is configured bigger, the MEMORY% can be larger than 100%.

Please refer to this PR for the detailed explanation.

How to fix that

Starting from k8s 1.23, kubectl top node has an option --show-capacity. This option defines how the percentage is computed:

  • (Default invocation) kubectl top node = kubectl top node --show-capacity=false: displays node_memory_working_set_bytes / Allocatable memory and could exceed 100% as explained above.
  • kubectl top node --show-capacity=true: displays node_memory_working_set_bytes / Capacity memory which is always within 100%.

From the example provided by @joemilacek for the node with 254% of memory node_memory_working_set_bytes = 2899Mi Allocatable memory = 1209608Ki = 1118Mi Capacity memory = 3831048Ki = 3741Mi

  • kubectl top node would show 2899Mi / 1118Mi = 245%
  • kubectl top node --show-capacity=true would show 2899Mi / 3741Mi = 77%

tkrishtop avatar Sep 19 '22 09:09 tkrishtop

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 18 '22 10:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 17 '23 11:01 k8s-triage-robot

/remove-lifecycle rotten /label good first issue /retitle Document the reason for >100% memory usage

rexagod avatar Jan 29 '23 22:01 rexagod

/label good-first-issue

rexagod avatar Jan 29 '23 22:01 rexagod

@rexagod: The label(s) /label good-first-issue cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 29 '23 22:01 k8s-ci-robot

@rexagod I'd like to work on this issue!

PrimalPimmy avatar Jan 30 '23 09:01 PrimalPimmy

/assign @PrimalPimmy Thank you for taking this up! 🙌🏼

rexagod avatar Jan 30 '23 09:01 rexagod

@rexagod which file would you like me to document this?

PrimalPimmy avatar Feb 01 '23 17:02 PrimalPimmy

I think the FAQ.md should be the correct place to document this.

rexagod avatar Feb 01 '23 19:02 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 02 '23 20:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 01 '23 20:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jul 01 '23 20:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 01 '23 20:07 k8s-ci-robot