microk8s doesn't seem to work on raspberry pi 3 or 4 devices
Hey there,
I've spent the day trying to get microk8s running on a variety of Raspberry Pi following the tutorial here, with a variety of different and interesting failures. I apologise if some of these are duplicated, I am in no way familiar with k8s and flailing wildly.
Running raspbian aarch64 from here. Tested with microk8s stable and edge, both showing the same symptoms / collection of issues. uname -a shows:
Linux pi-k8s-01 5.10.17-v8+ #1414 SMP PREEMPT Fri Apr 30 13:23:25 BST 2021 aarch64 GNU/Linux
inspection-report-20210721_030553.tar.gz
Firstly, installing microk8s appears to work, and after a reboot microk8s status shows things are okay. Commands vary between completing instantly and taking tens of seconds, which may be related to #2280 though moving journald to volatile storage has not made a notable difference (it seems maybe related to the container restarting later).
pi@pi-k8s-01:~ $ microk8s status
microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
ha-cluster # Configure high availability on the current node
disabled:
dashboard # The Kubernetes dashboard
dns # CoreDNS
helm # Helm 2 - the package manager for Kubernetes
helm3 # Helm 3 - Kubernetes package manager
host-access # Allow Pods connecting to Host services smoothly
ingress # Ingress controller for external access
linkerd # Linkerd is a service mesh for Kubernetes and other frameworks
metallb # Loadbalancer for your Kubernetes cluster
metrics-server # K8s Metrics Server for API access to service metrics
openebs # OpenEBS is the open-source storage solution for Kubernetes
portainer # Portainer UI for your Kubernetes cluster
prometheus # Prometheus operator for monitoring and logging
rbac # Role-Based Access Control for authorisation
registry # Private image registry exposed on localhost:32000
storage # Storage class; allocates storage from host directory
traefik # traefik Ingress controller for external access
Checking what's running under k8s it looks like a lot of tasks are not ready, maybe related to #2367:
pi@pi-k8s-01:~ $ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-f7868dd95-bf2st 0/1 Pending 0 101m
kube-system pod/calico-node-nzjzw 1/1 Running 39 101m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 101m
kube-system service/metrics-server ClusterIP 10.152.183.23 <none> 443/TCP 96m
kube-system service/kubernetes-dashboard ClusterIP 10.152.183.185 <none> 443/TCP 35m
kube-system service/dashboard-metrics-scraper ClusterIP 10.152.183.179 <none> 8000/TCP 35m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 1 1 0 1 0 kubernetes.io/os=linux 101m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 0/1 1 0 101m
kube-system deployment.apps/metrics-server 0/1 0 0 96m
kube-system deployment.apps/kubernetes-dashboard 0/1 0 0 35m
kube-system deployment.apps/dashboard-metrics-scraper 0/1 0 0 35m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-f7868dd95 1 1 0 101m
Running commands has a 50/50 chance of failure, presumably related to #1916 (and maybe #2280), though this node is not joined to a cluster nor does anything appear to be getting OOM killed. This usually returns one of a few errors:
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
Attempting to install the dashoard with microk8s enable dashboard sometimes seems to work, and in other cases appears to kill the snap.microk8s.daemon-kubelite, seemingly requiring a restart to recover. When it does claim to succeed the containers never seem to run (as you can see above), nor does microk8s status report that the dashboard is enabled.
Attempting to forward a port to the dashboard machine (via microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443) first results in:
error: watch closed before UntilWithoutRetry timeout
which is probably to be expected if the container isn't up. Then if retried:
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
which appears to also crash something that takes a while to recover.
A few errors do end up in the logs, but I haven't had much luck resolving them. A sampling of unique entries via sudo journalctl -u snap.microk8s.daemon-* --all | grep error | tail -500:
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.486142810+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found in directory /lib/modules/5.10.17-v8+\\n\"): skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.486893901+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.487056063+01:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.488229827+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.694509662+01:00" level=error msg="failed to delete" cmd="/snap/microk8s/2343/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /snap/microk8s/2343/bin/containerd -id e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 delete" error="exit status 1"
...
Jul 21 03:24:15 pi-k8s-01 microk8s.daemon-kubelite[29977]: E0721 03:24:15.068578 29977 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.694509662+01:00" level=error msg="failed to delete" cmd="/snap/microk8s/2343/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /snap/microk8s/2343/bin/containerd -id e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 delete" error="exit status 1"
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.695113389+01:00" level=warning msg="failed to clean up after shim disconnected" error="io.containerd.runc.v1: remove /run/containerd/s/bf95c1279ce968440d9923564b9437c60297e8f859ed59ea05f45fe7462c2cc7: no such file or directory\n: exit status 1" id=e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 namespace=k8s.io
Jul 21 03:24:28 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:28.685861940+01:00" level=error msg="failed to reload cni configuration after receiving fs change event(\"/var/snap/microk8s/2343/args/cni-network/10-calico.conflist\": REMOVE)" error="cni config load failed: no network config found in /var/snap/microk8s/2343/args/cni-network: cni plugin not initialized: failed to load cni config"
Jul 21 03:24:33 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:33.723080957+01:00" level=error msg="collecting metrics for 053fcd20e91ca8cc84f8b3dbfe64cb59d09b77d13efbd16efe35c277156d3399" error="cgroups: cgroup deleted: unknown"
...
Jul 21 03:32:18 pi-k8s-01 microk8s.daemon-kubelite[7740]: E0721 03:32:18.774730 7740 status.go:71] apiserver received an error that is not an metav1.Status: &url.Error{Op:"Get", URL:"https://pi-k8s-01:10250/containerLogs/kube-system/calico-node-nzjzw/upgrade-ipam", Err:(*net.OpError)(0x40013f3db0)}: Get "https://pi-k8s-01:10250/containerLogs/kube-system/calico-node-nzjzw/upgrade-ipam": dial tcp 127.0.0.1:10250: connect: connection refused
Most of these errors are repeated constantly, the "exit status 1" seems to suggest containerd is restarting but, there doesn't seem to be any super obvious indication of a fresh start in the logs.
I hope some part of this is useful, I'd hoped for it to be a quick morning setup but seem to have bitten off more than I can chew ^_^
Hi @ryankurte,
I see that the CNI (calico-node) is not healthy, it has restarted 39 times in 101 minutes. As the CNI is not healthy there is no networking for the Kubernetes pods so it is expected for the rest of the services to not work.
This machine has 2GB of RAM and you have added 100MB of swap. With this amount of swap is as if there is none.
Here is a suggestion: microk8s disable ha-cluster , this will switch you to a setup with etcd instead of dqlite and flannel instead of calico.
thanks for the response! after running docker it didn't even occur to me that the runtime would need > 2GB of ram, is this too low to be worthwhile? (i actually didn't intend the swap to be there at all, the rpi seems to have automatically done it on file system expand, i may have missed recommendations about this?).
after disabling ha-cluster it's idling around ~500MB memory used (on a base of ~200MB without it running) which seems like it should be workable, and after microk8s enable dashboard it goes up to 600MB...
microk8s status however still reports:
microk8s is not running. Use microk8s inspect for a deeper inspection.
and the services don't seem to be running (or, ready, anyway).
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/metrics-server-698f47cc84-nzwks 0/1 Running 6 13m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 16m
kube-system service/dashboard-metrics-scraper ClusterIP 10.152.183.20 <none> 8000/TCP 13m
kube-system service/kubernetes-dashboard ClusterIP 10.152.183.200 <none> 443/TCP 13m
kube-system service/metrics-server ClusterIP 10.152.183.169 <none> 443/TCP 13m
while microk8s inspect seems to suggest everything is okay:
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting juju
Inspect Juju
Inspecting kubeflow
Inspect Kubeflow
inspection-report-20210722_021452.tar.gz
ram use is bouncing around a lot so i guess something is still failing / restarting, and it sorta looks like systemd is stopping and starting microk8s.daemon-containerd every few seconds or so:
Jul 22 02:22:09 pi-k8s-01 systemd[1]: Started Service for snap application microk8s.daemon-containerd.
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748621182+01:00" level=info msg="Start event monitor"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748726828+01:00" level=info msg="Start snapshots syncer"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748766124+01:00" level=info msg="Start cni network conf syncer"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748795271+01:00" level=info msg="Start streaming server"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.330933873+01:00" level=info msg="Stop CRI service"
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Stopping Service for snap application microk8s.daemon-containerd...
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345294578+01:00" level=info msg="Stop CRI service"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345431743+01:00" level=info msg="Event monitor stopped"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345469353+01:00" level=info msg="Stream server stopped"
Jul 22 02:22:25 pi-k8s-01 systemd[1]: snap.microk8s.daemon-containerd.service: Succeeded.
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Starting Service for snap application microk8s.daemon-containerd...
dmesg also shows the virtual network adaptor bouncing around but, not much else:
[84269.623924] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84269.625245] cni0: port 1(vetha72a21b7) entered blocking state
[84269.625268] cni0: port 1(vetha72a21b7) entered disabled state
[84269.625561] device vetha72a21b7 entered promiscuous mode
[84269.625662] cni0: port 1(vetha72a21b7) entered blocking state
[84269.625668] cni0: port 1(vetha72a21b7) entered forwarding state
[84320.885230] cni0: port 1(vetha72a21b7) entered disabled state
systemd reports that etcd is up, flanneld is failing with:
failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again
so i'm just looking into that now, fingers crossed ^_^
expanded log (via sudo journalctl -u snap.microk8s.daemon-flanneld.service --all -n 500):
Here
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.516644 14053 main.go:514] Determining IP address of default interface
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.517697 14053 main.go:527] Using interface with name eth0 and address 192.168.1.54
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.517760 14053 main.go:544] Defaulting external address to interface address (192.168.1.54)
Jul 22 02:31:33 pi-k8s-01 flanneld[14053]: warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.519699 14053 main.go:244] Created subnet manager: Etcd Local Manager with Previous Subnet: 10.1.43.0/24
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.519734 14053 main.go:247] Installing signal handlers
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.599360 14053 main.go:386] Found network config - Backend type: vxlan
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.599538 14053 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.607213 14053 local_manager.go:147] Found lease (10.1.43.0/24) for current IP (192.168.1.54), reusing
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: E0722 02:31:33.614657 14053 main.go:289] Error registering network: failed to configure interface flannel.1: failed to ensure addr
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.615088 14053 main.go:366] Stopping shutdownHandler...
Jul 22 02:31:33 pi-k8s-01 systemd[1]: snap.microk8s.daemon-flanneld.service: Main process exited, code=exited, status=1/FAILURE
Jul 22 02:31:33 pi-k8s-01 systemd[1]: snap.microk8s.daemon-flanneld.service: Failed with result 'exit-code'.
I've been happily running K8s on a cluster of RPi 2s and 3s for years. But as of (roughly) 1.17 it seems that Kubernetes just can't run on these machines anymore. I have similar results with microk8s and k3s. 1.17 kind of runs, but the master just dies after a while, once you've deployed one or more actual apps (like cert-manager and ArgoCD). With 1.21 I can't even get to k get nodes, it never really start up fully, with either k3s or microk8s.
Guess I'm going to have to find something else to do with my RPis, I don't want to upgrade to RPi 4s as they run too hot, I have the cluster in my office and I don't want a rack of fans running next to me all day.
@teq0 i've subsequently got k3s via k3sup running on a cluster of Pi 3s without any issue (though the leader is a Pi 4) on raspbian aarch64, not heavily loaded but it seems to be running alright.
I was about to try that option myself. Given that microk8s is multi-master (if I'm reading the docs correctly) I don't think it will work with a mix of 3s and 4s, but single-master k3s should probably work.
Update: RPi4 as master and RPi3s as agents seems to work fine with k3s. When I get time I might try the same with microk8s.
Having similar issues as well. I'll switch to k3s to see if that works better.
@copiltembel @teq0 if you do not want to have HA (multi-master) you can microk8s disable ha-cluster on your nodes before joining them. In this setup only the first node will run the control plane.
@ktsakalozos I've disabled ha-cluster and will give it another go.
I have a mixed setup with two nodes running on rpi4 (arm64) and a single amd64 machine. From what I read HA was enabled as soon as there were 3 nodes in the cluster. But my amd64 was mostly off, maybe that caused microk8s to fail.
I have spent 2 days to figure out why my Raspberry Pi 4 - 4Gb memory is refusing connection. I have put exactly same commands on x64 and aarch64. Tried almost everything what i could found on the internet. After executing microk8s disable ha-cluster i was able to run microk8s.status <= and it informed me that it is lunched and after "microk8s.kubectl get cs" I have got healthy status of components. It is strange that i had no such problems on x64 :octocat:
So next day i had to reinstall again, because it started again showing "connection refuse" - so problem still exists
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.