microk8s
microk8s copied to clipboard
apiserver timeouts on SSD
Summary
The cluster I'm managing has gone into a bad state where pods are not being scheduled or terminated (un-forcibly). Kubernetes provides no Events pointing me to problems, and the nodes seem available. Ended up digging into log files to try and find the problem.
One thing I've noticed looking in /var/log/syslog
is, what appears to me, as long api calls from microk8s.daemon-kubelite
. Specifically one, which looks like the Raspberry Pi with low disk performance case on your troubleshooting page. However these are desktop grade consumer hardware with SSDs attached.
I'm seeing values for "Object stored in database"
transition times of: 0.8 - 32 seconds.
Drive Performance
Node 1
node@node1:~$ sudo hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 37266 MB in 1.98 seconds = 18806.90 MB/sec
Timing buffered disk reads: 1572 MB in 3.00 seconds = 523.46 MB/sec
node@node1:~$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.779965 s, 1.4 GB/s
Node 2
node@node2:~$ sudo hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 38138 MB in 1.98 seconds = 19248.73 MB/sec
Timing buffered disk reads: 1522 MB in 3.00 seconds = 506.86 MB/sec
node@node2:~$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.784317 s, 1.4 GB/s
Node3
node@node3:~$ sudo hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 37240 MB in 1.98 seconds = 18795.10 MB/sec
Timing buffered disk reads: 1606 MB in 3.00 seconds = 534.94 MB/sec
and: No such file or directory
node@node3:~$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.782774 s, 1.4 GB/s
Introspection Report
Currently hanging on Inspect kubernetes cluster
. So unable to attach a tarball. Maybe a longer running task then I'm expected (how long should this take?)
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Logs
I'm happy to provide some logs, such as the /var/log/syslog
if helpful, though I'd need to be assured there's no sensitive data in them before uploading.
Thanks for any help in diagnosing this issue.
Thank you for reporting this @jamesstidard .
Would it be possible to share the logs of kubelite, k8s-dqlite, and the apiserver-kicker? Please review them so they do not reveial any sensitive data.
journalctl -n 1000 -u snap.microk8s.daemon-apiserver-kicker
journalctl -n 1000 -u snap.microk8s.daemon-kubelite
journalctl -n 1000 -u snap.microk8s.daemon-k8s-dqlite
I am looking for a hint as to what the services are doing during this kind on incidents.
Hi @ktsakalozos
Very much appreciate the response.
I have 3 manager nodes, so I've included the output from all 3 nodes.
node1-apiserver-kicker.log node1-dqlite.log node1-kubelite.log
node2-apiserver-kicker.log node2-dqlite.log node2-kubelite.log
node3-apiserver-kicker.log node3-dqlite.log node3-kubelite.log
I've restarted my master nodes, and I've been able to see some events.
9m57s Normal Starting node/node3 Starting kubelet.
9m57s Normal Starting node/node3
9m57s Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
9m57s Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
9m57s Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
9m57s Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
9m57s Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
8m14s Normal Starting node/node2 Starting kubelet.
8m13s Normal Starting node/node2
8m14s Warning InvalidDiskCapacity node/node2 invalid capacity 0 on image filesystem
8m14s Normal NodeHasSufficientMemory node/node2 Node node2 status is now: NodeHasSufficientMemory
8m14s Normal NodeHasNoDiskPressure node/node2 Node node2 status is now: NodeHasNoDiskPressure
8m14s Normal NodeHasSufficientPID node/node2 Node node2 status is now: NodeHasSufficientPID
8m14s Normal NodeAllocatableEnforced node/node2 Updated Node Allocatable limit across pods
6m48s Normal Starting node/node3 Starting kubelet.
6m47s Normal Starting node/node3
6m48s Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
6m47s Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
6m47s Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
6m47s Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
6m47s Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
5m7s Normal Starting node/node2 Starting kubelet.
5m6s Normal Starting node/node2
5m7s Warning InvalidDiskCapacity node/node2 invalid capacity 0 on image filesystem
5m7s Normal NodeAllocatableEnforced node/node2 Updated Node Allocatable limit across pods
5m7s Normal NodeHasSufficientMemory node/node2 Node node2 status is now: NodeHasSufficientMemory
5m7s Normal NodeHasNoDiskPressure node/node2 Node node2 status is now: NodeHasNoDiskPressure
5m7s Normal NodeHasSufficientPID node/node2 Node node2 status is now: NodeHasSufficientPID
4m59s Warning MissingClusterDNS node/node2 kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
3m42s Normal Starting node/node3 Starting kubelet.
3m41s Normal Starting node/node3
3m42s Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
3m42s Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
3m42s Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
3m42s Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
3m42s Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
114s Normal Starting node/node2 Starting kubelet.
113s Normal Starting node/node2
114s Warning InvalidDiskCapacity node/node2 invalid capacity 0 on image filesystem
114s Normal NodeHasSufficientMemory node/node2 Node node2 status is now: NodeHasSufficientMemory
114s Normal NodeHasNoDiskPressure node/node2 Node node2 status is now: NodeHasNoDiskPressure
114s Normal NodeHasSufficientPID node/node2 Node node2 status is now: NodeHasSufficientPID
114s Normal NodeAllocatableEnforced node/node2 Updated Node Allocatable limit across pods
46s Normal Starting node/node3 Starting kubelet.
45s Normal Starting node/node3
46s Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
So it looks like the kubelet
service cannot start and is looping on all nodes. I've googled the invalid capacity 0
warning and that is initially looking like potentially a false flag (plus it's a warning and not an error). I've looked in the kubelite
logs for the node (like the ones above), but I do not feel familiar enough to see the root cause (or if it's even logged in there).
Let me know if there's any more information I can provide, and thanks again for any assistance.
Let me briefly describe how the k8s datastore works and hopefully this would allow us to pinpoint the reason of such long delays in accessing it.
The data store in k8s is based on a consensus protocol, raft. In an HA deployment (ie more than 2 nodes) a leader node is elected and two more nodes become voters. There has to be quorum among the three nodes (leader and the voters) for an entry to be persisted in the datastore. When a write request reaches the datastore it is propagated to the leader and the leader waits until the majority of the nodes to persist the request data. When a read request hits the datastore it is forwarded to the leader and the leader responds to it.
The reasons why the datastore is slow can be traced to disk, network or CPU contention. If we have eliminated the disk issue we can move to looking into network utilization. One suggestion, is to stop all storage services running on the voters and leader. For example, I think you are running openebs. Is it possible openebs transfers large blocks causing network hiccups ? Can you try stopping openebs and/or move it to worker nodes? Another approach is to have an external etcd on dedicated hardware to hold the k8s datastore.
Thanks for the response. I'm unfortunately not able to stop or move my openebs services, as I get timeout errors. I tried appending --timeout=120s
to my kubectl
commands, though it doesn't seem to change the timeout duration. Is there a way I can increase the timeout so I'm able to still manage the cluster during this period, and I can start spinning down any running openebs services?
Thanks
Jumping onto this thread, albeit with borderline anecdotal evidence. I have reason to believe i've encountered this same issue, after running a dqlite HA microk8s cluster for about 4 months as a production home lab cluster. This problem seemed to start occurring a couple days ago, with nodes randomly going NotReady and pods getting stuck across the whole cluster. The entire API server has been extremely slow to respond at all to the point that I began attempting to recover the cluster.
Ultimately, what I tried doing was removing dqlite members with microk8s leave
, snap remove --purge microk8s
and reinstalling microk8s on each node, then joining back to the cluster in a rolling fashion. It could maintain quorum the whole time, as shortly after adding a 3rd node each time it re-enabled HA mode with 3 voting members. But the problems never went away and my cluster was still broken, with workloads unable to fully restore. What I noticed when restarting dqlite masters, whichever node became the leader node consistently saw constant 100% single-core usage for the dqlite process even when I disable all other microk8s services like kubelite. I was able to cause leader elections to move the leader node around my cluster, even adding more powerful hardware in case this was just a raft backlog, but even on my most powerful machines (single-core perf) the issue didn't go away. I even tried to remove all master nodes except one to reduce the cluster to a single node cluster, but the constant 100% single-core usage remained.
So as it stands my cluster is effectively dead and unrecoverable. I tried restoring earlier backups but they had no effect on solving this issue. My cluster version was 1.26.8 in the 1.26/stable channel. RIP little guy, i'm going to move forward with building a new cluster on a different k8s distribution and restoring my volumes from offsite backup. Not that it matters, but i'm probably never going to use microk8s again.
I found the same issue as @jtcressy, but restoring the backup worked just fine. https://github.com/canonical/microk8s/issues/3706