microk8s
microk8s copied to clipboard
High CPU usage when standalone
What did you do?
- I started microK8s cluster
What did you expect to happen?
I expect CPU usage to be around 5%, but it oscillates between 40-80% CPU
-
top -o cpu
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT
71452 hyperkit 72.0 12:29.26 12 1 39 4485M 0B 391M 60028 60028 sleeping *0[1] 0.00000 0.00000 0 1226131+ 460 165
Which OS & Architecture?
- MacOS 10.14.6 (18G4032)
- x86
I don't know how to obtain the tarball created by microk8s inspect
from the VM
Hi @paneq, how did you install MicroK8s? Did you install it with the respective installers? The installers use multipass to provision the VM on MacOS and you can use multipass transfer
to push and pull files to and from the VM.
We're seeing 20-40% on a single core after a snap install.
Could you please provide the hardware specs of the machine running MicroK8s?
With MicroK8s v1.19 we have replaced etcd with dqlite and dqlite is embedded to the K8s API server.
To pinpoint a cpu hot spot we would like to see some logs. If the microk8s inspect
tarball is too much please consider at least attaching the journalctl -n 3000 -u snap.microk8s.daemon-apiserver
. Thank you.
I am seeing this as well with a single node cluster I set up this afternoon. regular spikes to 100% on 2 cpu cores with nothing installed except metallb and dns (via microk8s enable metallb dns
)
kube-apiserver is using most of the CPU, and the api can become unresponsive for minutes at a time, and this seems to happen every few minutes, especially so after making any kind of change.
It's running in an ubuntu 20.04 server vm installed via iso, with microk8s installed by selecting it during the installation process.
Exact version is v1.19.3-34+a56971609ff35a
Log from running journalctl -n 3000 -u snap.microk8s.daemon-apiserver
attached. As is the inspect tarball.
@markallanson regarding the apiserver being unresponsive, there is a service called apiserver-kicker
which automatically restarts the apiserver when it detects that there is some change in the network. In its log, i see that it detected changes every few minutes.
We can try to temporarily stop the kicker to see if this stops it from restarting the apiservice.
You can check this see if it works for you. https://github.com/ubuntu/microk8s/issues/1775#issuecomment-734218451
Yes this makes the api server processes responsive with no lost connectivity - thanks.
the kubelet processes are now going mad using a lot of memory and CPU, though that must be another issue.
Problem worst on high core cpu.
Installed Ubuntu 20.04 on host from Ubuntu 18 Installed virtualbox created vm with 16 cpu ubuntu 20.04 installed guest additions in the vm sudo snap install microk8s --classic --channel=latest/stable did not apply any of my yaml files, just base install of microk8s.
CPU usage goes nuts hours later, multiple reboot of VM and host, start /stop of microk8s, CPU still very high
inspection-report-20201229_220359.tar.gz
grep "error" -r . produces tones of transport errors
I have same problem, it took ~30% (in peak, even 40-60%) of my CPU I5 8300H, and my laptop was noisy, when I turned mikrok8s off, I had only ~2% CPU usage, and my CPU get cooler by 10°C.
(official installation (sudo snap install microk8s --classic) on Ubuntu 20.04)
We're seeing 20-40% on a single core after a snap install.
100% usage when memory less than 1024 mb (in my case 640mb)
Same problem here, running on Ubuntu Server 20.4 with the standard installation... sudo snap install microk8s Any ideas ?
Is it the API server with the high CPU? If so it would help to attach the logs of the journalctl -n 4000 -u snap.microk8s.daemon-apiserver
to see what is happening.
We have the same problem, ubuntu 20.04. api-server.log
In addition to the loaded processor, constant loss of connection with the api server. Very long operations during writing and reading from dqlite are confusing, some of them reach several seconds. Seems like missing required base indexes. We use modern hard drives with a connected ssd cache
Same problem here. Installation for 3 nodes in HA mode. Ubuntu 20.04 version v1.20.1-34 + e7db93d188d0d1.
kube-apiserver uses most of the CPU and the api may become unresponsive and this happens very often, every few minutes. Especially very noticeable after installing any additions depending on kube-apiserver. May be there is a workaround. Can't release the cluster to production :(
In my case, journalctl
reports that some transaction in an infinite loop:
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction committed" 4ms (23:03:00.914)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction prepared" 0ms (23:03:00.915)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction committed" 4ms (23:03:00.919)
Mar 05 23:03:14 XXXX microk8s.daemon-apiserver[1318]: Trace[2129607300]: ---"Transaction prepared" 0ms (23:03:00.920)
There are tons of repeating log entries. Even worse, I'm using NVMe hard drive and it overheats due permanent high IOPS.
Microk8s is installed from official snap channel. 100% CPU load is observed immediately on base installation.
OS: Ubunto 20.04 LTS x86_64 MicroK8S: v1.20.2 Kernel: 5.4.0-66-generic CPU: Core i7, 8 physical, 16 logical cores RAM: 24 GB
yes, i also have high cpu load on idle microk8s.
i did nothing but "snap install microk8s --classic" and kube-apiserver is constantly >10% cpu load.
i see this in the logs every 2 seconds:
# tail -f syslog|grep nil
Mar 23 17:10:02 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:02.977871923Z" level=info msg="Exec process \"d678497a8047ebe4d608e6307b763326577d162dcfa247c4f9822f5b2af6248c\" exits with exit code 0 and error <nil>"
Mar 23 17:10:06 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:06.582351200Z" level=info msg="Exec process \"89b68a58097f7ca0152b20a694c768ebcfe7915c88ed872d8aa5a5d1bc189263\" exits with exit code 0 and error <nil>"
Mar 23 17:10:08 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:08.060196886Z" level=info msg="Exec process \"fea78c5c20ee48f68e61f9ba48243e3b2fe2ae9c72741397ff67e69fb6f9d990\" exits with exit code 0 and error <nil>"
Mar 23 17:10:12 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:12.977476476Z" level=info msg="Exec process \"1931b176410e67953ff600adab55f830d84dd481c9d8d7bb945547640acb3f4a\" exits with exit code 0 and error <nil>"
Mar 23 17:10:16 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:16.593262433Z" level=info msg="Exec process \"3f769b56e281b4d8cd5d542325cdb5c8e5da05db99c4a17bef8b7783b4501d69\" exits with exit code 0 and error <nil>"
Mar 23 17:10:18 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:18.053523348Z" level=info msg="Exec process \"5308879d4ecf563101738df82b56435b784a7fd6ab0348c3792cc33757112cfe\" exits with exit code 0 and error <nil>"
Mar 23 17:10:22 microk8s microk8s.daemon-containerd[4637]: time="2021-03-23T17:10:22.973033474Z" level=info msg="Exec process \"d93ea8a5dc9c7e5ec22bf2f79afac63f636a622951d71d6c1f83c5861f878aac\" exits with exit code 0 and error <nil>"
journalctrl telling
Mar 23 17:24:59 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:24:59.939428059Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-ready] and timeout 1 (s)"
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.064402715Z" level=info msg="Finish piping \"stderr\" of container exec \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\""
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.064553108Z" level=info msg="Finish piping \"stdout\" of container exec \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\""
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.066784612Z" level=info msg="Exec process \"764a58a0b354ce851f0a9e04485a3f2182c04daf947d9a3b141dc1aa63aacd30\" exits with exit code 0 and error <nil>"
Mar 23 17:25:00 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:00.068672127Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.040082285Z" level=info msg="ExecSync for \"185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450\" with command [/usr/bin/check-status -r] and timeout 1 (s)"
Mar 23 17:25:02 microk8s systemd[1212]: run-containerd-runc-k8s.io-185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450-runc.ooflp5.mount: Succeeded.
Mar 23 17:25:02 microk8s systemd[1]: run-containerd-runc-k8s.io-185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450-runc.ooflp5.mount: Succeeded.
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.122983177Z" level=info msg="Exec process \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\" exits with exit code 0 and error <nil>"
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.123138847Z" level=info msg="Finish piping \"stdout\" of container exec \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\""
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.123229848Z" level=info msg="Finish piping \"stderr\" of container exec \"83056627d7576e69a1b89b877d8dcc0c3182bde26ac87234c1ab4e16f655d742\""
Mar 23 17:25:02 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:02.129618433Z" level=info msg="ExecSync for \"185837912bb662cf9274aa48c2cbecce5a9253a37feb90449bdcbb3b7df7a450\" returns with exit code 0"
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.797896549Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-live] and timeout 1 (s)"
Mar 23 17:25:05 microk8s systemd[1212]: run-containerd-runc-k8s.io-9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78-runc.jtzht5.mount: Succeeded.
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.911052091Z" level=info msg="Finish piping \"stderr\" of container exec \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\""
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.911178571Z" level=info msg="Finish piping \"stdout\" of container exec \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\""
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.914663745Z" level=info msg="Exec process \"30c693442011267aa02d6fc4625c4e1198a7c0dfcf21b9e4ff18988a4304571c\" exits with exit code 0 and error <nil>"
Mar 23 17:25:05 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:05.916413580Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
Mar 23 17:25:09 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:09.939588119Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" with command [/bin/calico-node -felix-ready] and timeout 1 (s)"
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.052671203Z" level=info msg="Finish piping \"stdout\" of container exec \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\""
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.052813029Z" level=info msg="Finish piping \"stderr\" of container exec \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\""
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.054210554Z" level=info msg="Exec process \"3207dc67aac9c6a670bbcbf9b27583be3747e9b9df11be6fe54271e41ec17da8\" exits with exit code 0 and error <nil>"
Mar 23 17:25:10 microk8s microk8s.daemon-containerd[21071]: time="2021-03-23T17:25:10.056127991Z" level=info msg="ExecSync for \"9f7726d2f2e77bedc3af2636cd5d141760de0a18bb2b5e7202649bb067238c78\" returns with exit code 0"
^C
there is tons of futex() syscalls
# strace -c -f -p $(pidof kube-apiserver)
strace: Process 4724 attached with 16 threads
^Cstrace: Process 4724 detached
strace: Process 4769 detached
strace: Process 4770 detached
strace: Process 4771 detached
strace: Process 4772 detached
strace: Process 4773 detached
strace: Process 4774 detached
strace: Process 4775 detached
strace: Process 4776 detached
strace: Process 4791 detached
strace: Process 4828 detached
strace: Process 4829 detached
strace: Process 4830 detached
strace: Process 4831 detached
strace: Process 4832 detached
strace: Process 9753 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.94 3.195455 642 4977 834 futex
10.30 0.451047 4902 92 epoll_wait
8.58 0.375965 91 4092 1 epoll_pwait
5.53 0.242349 98 2448 nanosleep
1.01 0.044349 60 737 write
0.98 0.043014 41 1045 410 read
0.28 0.012273 3068 4 io_getevents
0.12 0.005421 32 167 74 epoll_ctl
0.10 0.004333 120 36 tgkill
0.05 0.001993 53 37 getpid
0.03 0.001514 42 36 1 rt_sigreturn
0.01 0.000524 131 4 io_submit
0.01 0.000430 18 23 setsockopt
0.01 0.000426 42 10 5 accept4
0.01 0.000396 22 18 mprotect
0.01 0.000376 53 7 close
0.01 0.000331 30 11 sched_yield
0.00 0.000193 32 6 getsockname
0.00 0.000165 82 2 socket
0.00 0.000152 76 2 connect
0.00 0.000129 7 17 getrandom
0.00 0.000097 97 1 1 restart_syscall
0.00 0.000052 26 2 getpeername
0.00 0.000007 7 1 getsockopt
0.00 0.000006 6 1 shutdown
------ ----------- ----------- --------- --------- ----------------
100.00 4.380997 13776 1326 total
disabling ha-cluster made no difference
OS: Ubuntu 20.04 Server LTS x86_64 with all updates MicroK8S: v1.20.4 classic Kernel: 5.4.0-67-generic CPU: proxmox/kvm hypervisor/ 4vCPU on Dual-CPU Xeon L5630 @ 2.13GHz System RAM: 8GB
Observing this same problem on Ubuntu 20.04 (running on a raspberry PI 4) :(
I confirm, cluster of 5x RaspberryPI4 8GB + ubuntu 21.10 + microk8s 1.22/stable, after disabling ha-cluster all CPUs load dropped from ~80% (some CPUs was on ~60%) to less than 10% No custom pods running, just fresh install and join 4 nodes to 1 master
I have tried various distros of kubernetes and every one of them had this problem. So this isn't a microk8s problem, it's kubernetes problem in general. When it gets fixed in upstream kubernetes, I suppose all the distros will benefit.
https://github.com/kubernetes/kubernetes/issues/75942
https://stackoverflow.com/questions/58405476/kubernetes-high-cpu-usage
https://github.com/docker/for-mac/issues/3065
quite interesting: https://www.datadoghq.com/blog/engineering/moving-a-jobsystem-to-kubernetes/
I'm have a standalone microk8s on a Raspberry Pi 4, 8GB on Ubuntu 21.10 - disabling the ha-cluster feature dropped CPU load and utilization significantly:
microk8s disable ha-cluster
microk8s disable ha-cluster
on my single node Ubuntu Server 20.04LTS helped quite a bit. It reduced idle CPU usage by about 75% with no pods running. Clearly no reason to have ha-cluster
enabled if you only plan to have a single node anyways.
WARNING WARNING WARNING Only do this on a new instance, do NOT do this on an existing deployment! You will lose all your data!
Running microk8s with ha-cluster disabled.
I am seeing a lot of
futex(0x99afc38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=2624764}) = -1 ETIMEDOUT (Connection timed out)
messages with strace
If you decide to go with the microk8s disable ha-cluster
option, be aware that it is destructive!: https://github.com/canonical/microk8s/issues/1577
I had CPU going crazy when my VM had 2G of RAM. Increasing the RAM to 3G has solved it, implying it was related to low memory.
I'm running MicroK8s on Ubuntu on a Synology Disk Station.
2 CPUs with 2 GHz and 4 GB of ram
My CPU Load was between 30% and 70% in idle mode. Disabling the High Availability Cluster helped to reduce the idle CPU load below 15%.
@tanzerm: Thank You!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.