microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

High CPU usage when standalone

Open paneq opened this issue 4 years ago • 25 comments

What did you do?

  • I started microK8s cluster

What did you expect to happen?

I expect CPU usage to be around 5%, but it oscillates between 40-80% CPU

  • top -o cpu
PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE    BOOSTS           %CPU_ME %CPU_OTHRS UID  FAULTS     COW      MSGSENT
71452  hyperkit     72.0 12:29.26 12    1    39    4485M  0B     391M   60028 60028 sleeping *0[1]            0.00000 0.00000    0    1226131+   460      165

Which OS & Architecture?

  • MacOS 10.14.6 (18G4032)
  • x86

I don't know how to obtain the tarball created by microk8s inspect from the VM

paneq avatar Sep 15 '20 07:09 paneq

Hi @paneq, how did you install MicroK8s? Did you install it with the respective installers? The installers use multipass to provision the VM on MacOS and you can use multipass transfer to push and pull files to and from the VM.

ktsakalozos avatar Sep 15 '20 08:09 ktsakalozos

We're seeing 20-40% on a single core after a snap install.

ChipWolf avatar Oct 04 '20 20:10 ChipWolf

Could you please provide the hardware specs of the machine running MicroK8s?

With MicroK8s v1.19 we have replaced etcd with dqlite and dqlite is embedded to the K8s API server.

To pinpoint a cpu hot spot we would like to see some logs. If the microk8s inspect tarball is too much please consider at least attaching the journalctl -n 3000 -u snap.microk8s.daemon-apiserver . Thank you.

ktsakalozos avatar Oct 05 '20 06:10 ktsakalozos

I am seeing this as well with a single node cluster I set up this afternoon. regular spikes to 100% on 2 cpu cores with nothing installed except metallb and dns (via microk8s enable metallb dns)

kube-apiserver is using most of the CPU, and the api can become unresponsive for minutes at a time, and this seems to happen every few minutes, especially so after making any kind of change.

It's running in an ubuntu 20.04 server vm installed via iso, with microk8s installed by selecting it during the installation process.

Exact version is v1.19.3-34+a56971609ff35a

Log from running journalctl -n 3000 -u snap.microk8s.daemon-apiserver attached. As is the inspect tarball.

log.txt inspection-report-20201206_214945.tar.gz

markallanson avatar Dec 06 '20 21:12 markallanson

@markallanson regarding the apiserver being unresponsive, there is a service called apiserver-kicker which automatically restarts the apiserver when it detects that there is some change in the network. In its log, i see that it detected changes every few minutes. We can try to temporarily stop the kicker to see if this stops it from restarting the apiservice. You can check this see if it works for you. https://github.com/ubuntu/microk8s/issues/1775#issuecomment-734218451

balchua avatar Dec 07 '20 10:12 balchua

Yes this makes the api server processes responsive with no lost connectivity - thanks.

the kubelet processes are now going mad using a lot of memory and CPU, though that must be another issue.

markallanson avatar Dec 07 '20 11:12 markallanson

Problem worst on high core cpu.

Installed Ubuntu 20.04 on host from Ubuntu 18 Installed virtualbox created vm with 16 cpu ubuntu 20.04 installed guest additions in the vm sudo snap install microk8s --classic --channel=latest/stable did not apply any of my yaml files, just base install of microk8s.

CPU usage goes nuts hours later, multiple reboot of VM and host, start /stop of microk8s, CPU still very high

high cpu inspection-report-20201229_220359.tar.gz

grep "error" -r . produces tones of transport errors

johngrabner avatar Dec 30 '20 14:12 johngrabner

I have same problem, it took ~30% (in peak, even 40-60%) of my CPU I5 8300H, and my laptop was noisy, when I turned mikrok8s off, I had only ~2% CPU usage, and my CPU get cooler by 10°C.

(official installation (sudo snap install microk8s --classic) on Ubuntu 20.04)

Mlocik97 avatar Jan 13 '21 20:01 Mlocik97

We're seeing 20-40% on a single core after a snap install.

100% usage when memory less than 1024 mb (in my case 640mb)

Havrushchyk avatar Jan 19 '21 12:01 Havrushchyk

Same problem here, running on Ubuntu Server 20.4 with the standard installation... sudo snap install microk8s Any ideas ?

frequencebanane avatar Jan 27 '21 08:01 frequencebanane

Is it the API server with the high CPU? If so it would help to attach the logs of the journalctl -n 4000 -u snap.microk8s.daemon-apiserver to see what is happening.

ktsakalozos avatar Jan 27 '21 09:01 ktsakalozos

We have the same problem, ubuntu 20.04. api-server.log

In addition to the loaded processor, constant loss of connection with the api server. Very long operations during writing and reading from dqlite are confusing, some of them reach several seconds. Seems like missing required base indexes. We use modern hard drives with a connected ssd cache

aivanov-citc avatar Feb 12 '21 15:02 aivanov-citc

Same problem here. Installation for 3 nodes in HA mode. Ubuntu 20.04 version v1.20.1-34 + e7db93d188d0d1.

kube-apiserver uses most of the CPU and the api may become unresponsive and this happens very often, every few minutes. Especially very noticeable after installing any additions depending on kube-apiserver. May be there is a workaround. Can't release the cluster to production :(

eshikhov avatar Feb 12 '21 19:02 eshikhov

In my case, journalctl reports that some transaction in an infinite loop:

Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction committed" 4ms (23:03:00.914)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction prepared" 0ms (23:03:00.915)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction committed" 4ms (23:03:00.919)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction prepared" 0ms (23:03:00.920)

There are tons of repeating log entries. Even worse, I'm using NVMe hard drive and it overheats due permanent high IOPS.

Microk8s is installed from official snap channel. 100% CPU load is observed immediately on base installation.

OS: Ubunto 20.04 LTS x86_64 MicroK8S: v1.20.2 Kernel: 5.4.0-66-generic CPU: Core i7, 8 physical, 16 logical cores RAM: 24 GB

sakno avatar Mar 05 '21 21:03 sakno

yes, i also have high cpu load on idle microk8s.

i did nothing but "snap install microk8s --classic" and kube-apiserver is constantly >10% cpu load.

i see this in the logs every 2 seconds:

# tail -f syslog|grep nil
Mar 23 17:10:02 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:02.977871923Z" level=info msg="Exec process \"d678497a8047ebe4d608e6307b763326577d162dcfa247c4f9822f5b2af6248c\" exits with exit code 0 and error <nil>"
Mar 23 17:10:06 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:06.582351200Z" level=info msg="Exec process \"89b68a58097f7ca0152b20a694c768ebcfe7915c88ed872d8aa5a5d1bc189263\" exits with exit code 0 and error <nil>"
Mar 23 17:10:08 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:08.060196886Z" level=info msg="Exec process \"fea78c5c20ee48f68e61f9ba48243e3b2fe2ae9c72741397ff67e69fb6f9d990\" exits with exit code 0 and error <nil>"
Mar 23 17:10:12 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:12.977476476Z" level=info msg="Exec process \"1931b176410e67953ff600adab55f830d84dd481c9d8d7bb945547640acb3f4a\" exits with exit code 0 and error <nil>"
Mar 23 17:10:16 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:16.593262433Z" level=info msg="Exec process \"3f769b56e281b4d8cd5d542325cdb5c8e5da05db99c4a17bef8b7783b4501d69\" exits with exit code 0 and error <nil>"
Mar 23 17:10:18 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:18.053523348Z" level=info msg="Exec process \"5308879d4ecf563101738df82b56435b784a7fd6ab0348c3792cc33757112cfe\" exits with exit code 0 and error <nil>"
Mar 23 17:10:22 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:22.973033474Z" level=info msg="Exec process \"d93ea8a5dc9c7e5ec22bf2f79afac63f636a622951d71d6c1f83c5861f878aac\" exits with exit code 0 and error <nil>"

journalctrl telling

Mar 23 17:24:59 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:24:59.939428059Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-ready] and timeout 1 (s)"
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.064402715Z" level=info msg="Finish piping \"stderr\" of container exec \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\""
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.064553108Z" level=info msg="Finish piping \"stdout\" of container exec \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\""
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.066784612Z" level=info msg="Exec process \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\" exits with exit code 0 and error <nil>"
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.068672127Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.040082285Z" level=info msg="ExecSync for \"185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450\" with command [/usr/bin/check-status -r] and timeout 1 (s)"
Mar 23 17:25:02 microk8s systemd[1212]: run-containerd-runc-k8s.io-185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450-runc.ooflp5.mount: Succeeded.
Mar 23 17:25:02 microk8s systemd[1]: run-containerd-runc-k8s.io-185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450-runc.ooflp5.mount: Succeeded.
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.122983177Z" level=info msg="Exec process \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\" exits with exit code 0 and error <nil>"
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.123138847Z" level=info msg="Finish piping \"stdout\" of container exec \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\""
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.123229848Z" level=info msg="Finish piping \"stderr\" of container exec \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\""
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.129618433Z" level=info msg="ExecSync for \"185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450\" returns with exit code 0"
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.797896549Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-live] and timeout 1 (s)"
Mar 23 17:25:05 microk8s systemd[1212]: run-containerd-runc-k8s.io-9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78-runc.jtzht5.mount: Succeeded.
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.911052091Z" level=info msg="Finish piping \"stderr\" of container exec \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\""
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.911178571Z" level=info msg="Finish piping \"stdout\" of container exec \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\""
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.914663745Z" level=info msg="Exec process \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\" exits with exit code 0 and error <nil>"
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.916413580Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
Mar 23 17:25:09 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:09.939588119Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-ready] and timeout 1 (s)"
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.052671203Z" level=info msg="Finish piping \"stdout\" of container exec \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\""
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.052813029Z" level=info msg="Finish piping \"stderr\" of container exec \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\""
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.054210554Z" level=info msg="Exec process \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\" exits with exit code 0 and error <nil>"
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.056127991Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
^C

there is tons of futex() syscalls

# strace -c -f -p $(pidof kube-apiserver)
strace: Process 4724 attached with 16 threads
^Cstrace: Process 4724 detached
strace: Process 4769 detached
strace: Process 4770 detached
strace: Process 4771 detached
strace: Process 4772 detached
strace: Process 4773 detached
strace: Process 4774 detached
strace: Process 4775 detached
strace: Process 4776 detached
strace: Process 4791 detached
strace: Process 4828 detached
strace: Process 4829 detached
strace: Process 4830 detached
strace: Process 4831 detached
strace: Process 4832 detached
strace: Process 9753 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.94    3.195455         642      4977       834 futex
 10.30    0.451047        4902        92           epoll_wait
  8.58    0.375965          91      4092         1 epoll_pwait
  5.53    0.242349          98      2448           nanosleep
  1.01    0.044349          60       737           write
  0.98    0.043014          41      1045       410 read
  0.28    0.012273        3068         4           io_getevents
  0.12    0.005421          32       167        74 epoll_ctl
  0.10    0.004333         120        36           tgkill
  0.05    0.001993          53        37           getpid
  0.03    0.001514          42        36         1 rt_sigreturn
  0.01    0.000524         131         4           io_submit
  0.01    0.000430          18        23           setsockopt
  0.01    0.000426          42        10         5 accept4
  0.01    0.000396          22        18           mprotect
  0.01    0.000376          53         7           close
  0.01    0.000331          30        11           sched_yield
  0.00    0.000193          32         6           getsockname
  0.00    0.000165          82         2           socket
  0.00    0.000152          76         2           connect
  0.00    0.000129           7        17           getrandom
  0.00    0.000097          97         1         1 restart_syscall
  0.00    0.000052          26         2           getpeername
  0.00    0.000007           7         1           getsockopt
  0.00    0.000006           6         1           shutdown
------ ----------- ----------- --------- --------- ----------------
100.00    4.380997                 13776      1326 total

disabling ha-cluster made no difference

OS: Ubuntu 20.04 Server LTS x86_64 with all updates MicroK8S: v1.20.4 classic Kernel: 5.4.0-67-generic CPU: proxmox/kvm hypervisor/ 4vCPU on Dual-CPU Xeon L5630 @ 2.13GHz System RAM: 8GB

devZer0 avatar Mar 23 '21 17:03 devZer0

Observing this same problem on Ubuntu 20.04 (running on a raspberry PI 4) :(

aleric323 avatar May 30 '21 02:05 aleric323

I confirm, cluster of 5x RaspberryPI4 8GB + ubuntu 21.10 + microk8s 1.22/stable, after disabling ha-cluster all CPUs load dropped from ~80% (some CPUs was on ~60%) to less than 10% No custom pods running, just fresh install and join 4 nodes to 1 master

mzatko avatar Nov 02 '21 23:11 mzatko

I have tried various distros of kubernetes and every one of them had this problem. So this isn't a microk8s problem, it's kubernetes problem in general. When it gets fixed in upstream kubernetes, I suppose all the distros will benefit.

phjr avatar Dec 17 '21 08:12 phjr

https://github.com/kubernetes/kubernetes/issues/75942

https://stackoverflow.com/questions/58405476/kubernetes-high-cpu-usage

https://github.com/docker/for-mac/issues/3065

quite interesting: https://www.datadoghq.com/blog/engineering/moving-a-jobsystem-to-kubernetes/

devZer0 avatar Dec 19 '21 01:12 devZer0

I'm have a standalone microk8s on a Raspberry Pi 4, 8GB on Ubuntu 21.10 - disabling the ha-cluster feature dropped CPU load and utilization significantly:

microk8s disable ha-cluster

tanzerm avatar Mar 25 '22 10:03 tanzerm

microk8s disable ha-cluster on my single node Ubuntu Server 20.04LTS helped quite a bit. It reduced idle CPU usage by about 75% with no pods running. Clearly no reason to have ha-cluster enabled if you only plan to have a single node anyways.

WARNING WARNING WARNING Only do this on a new instance, do NOT do this on an existing deployment! You will lose all your data!

Gbps avatar Mar 26 '22 17:03 Gbps

Running microk8s with ha-cluster disabled. I am seeing a lot of futex(0x99afc38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=2624764}) = -1 ETIMEDOUT (Connection timed out)

messages with strace

markododa avatar Mar 28 '22 20:03 markododa

If you decide to go with the microk8s disable ha-cluster option, be aware that it is destructive!: https://github.com/canonical/microk8s/issues/1577

juanitocalero avatar May 17 '22 09:05 juanitocalero

I had CPU going crazy when my VM had 2G of RAM. Increasing the RAM to 3G has solved it, implying it was related to low memory.

dmitrynovik avatar May 24 '22 01:05 dmitrynovik

I'm running MicroK8s on Ubuntu on a Synology Disk Station.

2 CPUs with 2 GHz and 4 GB of ram

My CPU Load was between 30% and 70% in idle mode. Disabling the High Availability Cluster helped to reduce the idle CPU load below 15%.

@tanzerm: Thank You!

roestzwiee avatar May 31 '22 07:05 roestzwiee

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '23 09:04 stale[bot]