microk8s
microk8s copied to clipboard
microk8s add-node fails (without error)
Summary
We tried to create a small microk8s cluster from two existing Ubuntu systems for testing purposes. There appears to be an issue with microk8s where nodes cannot join a cluster, and microk8s keeps crashing on the second node after attempting to join. We had a similar problem with two other systems before. We could resolve the error by reinstalling a fresh Ubuntu server image. However, the problem seems to reappear.
What Should Happen Instead?
Microk8s nodes should be able to successfully join a cluster, and microk8s should not crash on the second node. Workloads should be able to start without pods getting stuck in the "ContainerCreating" state.
Reproduction Steps
- Take two existing Ubuntu PCs and install snapd and microk8s (channel 1.29/stable with classic confinement) on both.
- Try to join the nodes using
microk8s add-node
. - Run the join command on the second node, which should finish successfully.
a. The command finishes successfully:
microk8s join 192.168.0.100:25000/<redacted> Contacting cluster at 192.168.0.100 Waiting for this node to finish joining the cluster. .. .. .. .. Successfully joined the cluster.
- Notice that
kubectl get no
does not list the new node, indicating that the node has not actually joined the cluster. a. Notice thatkubectl get no
will not list the new node.kubectl get no NAME STATUS ROLES AGE VERSION ws15 Ready <none> 56m v1.29.4
- Observe that Microk8s keeps crashing on the second node, with Kubelite appearing to be the culprit based on Journalctl logs.
a. Errors:
Jul 23 12:28:58 ws14 microk8s.daemon-kubelite[68619]: W0723 12:28:58.605269 68619 reflector.go:539] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.PodDisruptionBudget: Get "https://127.0.0.1:16443/apis/policy/v1/poddisruptionbudgets?limit=500&resourceVersion=0": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1") Jul 23 12:28:58 ws14 microk8s.daemon-kubelite[68619]: E0723 12:28:58.605368 68619 reflector.go:147] k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: Get "https://127.0.0.1:16443/apis/policy/v1/poddisruptionbudgets?limit=500&resourceVersion=0": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1") Jul 23 12:28:58 ws14 microk8s.daemon-kubelite[68619]: E0723 12:28:58.748636 68619 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, square/go-jose: error in cryptographic primitive]" Jul 23 12:28:58 ws14 microk8s.daemon-kubelite[68619]: E0723 12:28:58.948778 68619 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, square/go-jose: error in cryptographic primitive]" Jul 23 12:28:59 ws14 microk8s.daemon-kubelite[68619]: E0723 12:28:59.149109 68619 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, square/go-jose: error in cryptographic primitive]" Jul 23 12:28:59 ws14 microk8s.daemon-kubelite[68619]: E0723 12:28:59.265682 68619 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, square/go-jose: error in cryptographic primitive]"
- On the master node, notice some errors in Kubelite as well, but
microk8s status
returns just fine. a. Errors:./inspection-report/snap.microk8s.daemon-kubelite/journal.log:Jul 23 11:41:55 ws15 microk8s.daemon-kubelite[647979]: W0723 11:41:55.632296 647979 logging.go:59] [core] [Channel #135 SubChannel #136] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/6809/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock:12379", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/6809/var/kubernetes/backend/kine.sock:12379: connect: connection refused" ./inspection-report/snap.microk8s.daemon-kubelite/journal.log:Jul 23 11:41:57 ws15 microk8s.daemon-kubelite[647979]: W0723 11:41:57.631401 647979 logging.go:59] [core] [Channel #72 SubChannel #73] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/6809/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock:12379", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/6809/var/kubernetes/backend/kine.sock:12379: connect: connection refused"
- When trying to start a workload on the master node, the pod gets stuck in the "ContainerCreating" state, and
kubectl describe pod
shows events indicating that the pod sandbox changed and will be killed and re-created. a.kubectl describe pod
:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SandboxChanged 3m41s (x50 over 13m) kubelet Pod sandbox changed, it will be killed and re-created.
Introspection Report
Start (fresh install of microk8s with snap remove --purge
)
- Master Node (fresh install): master_node_fresh_install_1721727270173_0.tar.gz
- Second Node (fresh install): second_node_fresh_install_1721727654608_0.tar.gz
State after running microk8s join
from the second node
- Master Node (after join): master_node_after_join_1721730983156_0.tar.gz
- Second Node (after join): second_node_after_join_1721730623791_0.tar.gz
Additional System Info
- Setup: For this we created a small network using a router and configured both machines with static IP-Adresses. Both machines can access the internet via NAT.
-
uname -a
:Linux ws15 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Release:
Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy
-
sudo apt show snapd
:Package: snapd Version: 2.63+22.04 Built-Using: apparmor (= 3.0.4-2ubuntu2.4), libcap2 (= 1:2.44-1ubuntu0.22.04.1), libseccomp (= 2.5.3-2ubuntu2) Priority: optional Section: devel Origin: Ubuntu Maintainer: Ubuntu Developers <[email protected]> Bugs: https://bugs.launchpad.net/ubuntu/+filebug Installed-Size: 104 MB Depends: adduser, apparmor (>= 2.10.95-0ubuntu2.2), ca-certificates, fuse3 (>= 3.10.5-1) | fuse, openssh-client, squashfs-tools, systemd, udev, default-dbus-session-bus | dbus-session-bus, libc6 (>= 2.34), libfuse3-3 (>= 3.2.3), liblzma5 (>= 5.1.1alpha+20120614), liblzo2-2 (>= 2.02), libudev1 (>= 183), zlib1g (>= 1:1.1.4) Recommends: gnupg Suggests: zenity | kdialog Conflicts: snap (<< 2013-11-29-1ubuntu1) Breaks: snap-confine (<< 2.23), snapd-xdg-open (<= 0.0.0), ubuntu-core-launcher (<< 2.22), ubuntu-snappy (<< 1.9), ubuntu-snappy-cli (<< 1.9) Replaces: snap-confine (<< 2.23), snapd-xdg-open (<= 0.0.0), ubuntu-core-launcher (<< 2.22), ubuntu-snappy (<< 1.9), ubuntu-snappy-cli (<< 1.9) Homepage: https://github.com/snapcore/snapd Task: server-minimal, ubuntu-desktop-minimal, ubuntu-desktop, cloud-image, ubuntu-desktop-raspi, ubuntu-wsl, server, ubuntu-server-raspi, kubuntu-desktop, xubuntu-core, xubuntu-desktop, lubuntu-desktop, ubuntustudio-desktop-core, ubuntustudio-desktop, ubuntukylin-desktop, ubuntu-mate-core, ubuntu-mate-desktop, ubuntu-budgie-desktop, ubuntu-budgie-desktop-raspi Download-Size: 25,9 MB APT-Manual-Installed: yes APT-Sources: http://de.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
Can you suggest a fix?
Unfortunately no.
Are you interested in contributing with a fix?
I will gladly help, if I can.