microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

microk8s doesn't seem to work on raspberry pi 3 or 4 devices

Open ryankurte opened this issue 4 years ago • 10 comments

Hey there,

I've spent the day trying to get microk8s running on a variety of Raspberry Pi following the tutorial here, with a variety of different and interesting failures. I apologise if some of these are duplicated, I am in no way familiar with k8s and flailing wildly.

Running raspbian aarch64 from here. Tested with microk8s stable and edge, both showing the same symptoms / collection of issues. uname -a shows:

Linux pi-k8s-01 5.10.17-v8+ #1414 SMP PREEMPT Fri Apr 30 13:23:25 BST 2021 aarch64 GNU/Linux

inspection-report-20210721_030553.tar.gz

Firstly, installing microk8s appears to work, and after a reboot microk8s status shows things are okay. Commands vary between completing instantly and taking tens of seconds, which may be related to #2280 though moving journald to volatile storage has not made a notable difference (it seems maybe related to the container restarting later).

pi@pi-k8s-01:~ $ microk8s status
microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
addons:
  enabled:
    ha-cluster           # Configure high availability on the current node
  disabled:
    dashboard            # The Kubernetes dashboard
    dns                  # CoreDNS
    helm                 # Helm 2 - the package manager for Kubernetes
    helm3                # Helm 3 - Kubernetes package manager
    host-access          # Allow Pods connecting to Host services smoothly
    ingress              # Ingress controller for external access
    linkerd              # Linkerd is a service mesh for Kubernetes and other frameworks
    metallb              # Loadbalancer for your Kubernetes cluster
    metrics-server       # K8s Metrics Server for API access to service metrics
    openebs              # OpenEBS is the open-source storage solution for Kubernetes
    portainer            # Portainer UI for your Kubernetes cluster
    prometheus           # Prometheus operator for monitoring and logging
    rbac                 # Role-Based Access Control for authorisation
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory
    traefik              # traefik Ingress controller for external access

Checking what's running under k8s it looks like a lot of tasks are not ready, maybe related to #2367:

pi@pi-k8s-01:~ $ microk8s kubectl get all --all-namespaces
NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
kube-system   pod/calico-kube-controllers-f7868dd95-bf2st   0/1     Pending   0          101m
kube-system   pod/calico-node-nzjzw                         1/1     Running   39         101m

NAMESPACE     NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
default       service/kubernetes                  ClusterIP   10.152.183.1     <none>        443/TCP    101m
kube-system   service/metrics-server              ClusterIP   10.152.183.23    <none>        443/TCP    96m
kube-system   service/kubernetes-dashboard        ClusterIP   10.152.183.185   <none>        443/TCP    35m
kube-system   service/dashboard-metrics-scraper   ClusterIP   10.152.183.179   <none>        8000/TCP   35m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   1         1         0       1            0           kubernetes.io/os=linux   101m

NAMESPACE     NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers     0/1     1            0           101m
kube-system   deployment.apps/metrics-server              0/1     0            0           96m
kube-system   deployment.apps/kubernetes-dashboard        0/1     0            0           35m
kube-system   deployment.apps/dashboard-metrics-scraper   0/1     0            0           35m

NAMESPACE     NAME                                                DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-f7868dd95   1         1         0       101m

Running commands has a 50/50 chance of failure, presumably related to #1916 (and maybe #2280), though this node is not joined to a cluster nor does anything appear to be getting OOM killed. This usually returns one of a few errors:

The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?

Attempting to install the dashoard with microk8s enable dashboard sometimes seems to work, and in other cases appears to kill the snap.microk8s.daemon-kubelite, seemingly requiring a restart to recover. When it does claim to succeed the containers never seem to run (as you can see above), nor does microk8s status report that the dashboard is enabled.

Attempting to forward a port to the dashboard machine (via microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443) first results in:

error: watch closed before UntilWithoutRetry timeout

which is probably to be expected if the container isn't up. Then if retried:

The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?

which appears to also crash something that takes a while to recover.

A few errors do end up in the logs, but I haven't had much luck resolving them. A sampling of unique entries via sudo journalctl -u snap.microk8s.daemon-* --all | grep error | tail -500:

Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.486142810+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found in directory /lib/modules/5.10.17-v8+\\n\"): skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.486893901+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.487056063+01:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Jul 21 03:23:33 pi-k8s-01 microk8s.daemon-containerd[29649]: time="2021-07-21T03:23:33.488229827+01:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.694509662+01:00" level=error msg="failed to delete" cmd="/snap/microk8s/2343/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /snap/microk8s/2343/bin/containerd -id e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 delete" error="exit status 1"
...
Jul 21 03:24:15 pi-k8s-01 microk8s.daemon-kubelite[29977]: E0721 03:24:15.068578   29977 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.694509662+01:00" level=error msg="failed to delete" cmd="/snap/microk8s/2343/bin/containerd-shim-runc-v1 -namespace k8s.io -address /var/snap/microk8s/common/run/containerd.sock -publish-binary /snap/microk8s/2343/bin/containerd -id e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 -bundle /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 delete" error="exit status 1"
Jul 21 03:24:25 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:25.695113389+01:00" level=warning msg="failed to clean up after shim disconnected" error="io.containerd.runc.v1: remove /run/containerd/s/bf95c1279ce968440d9923564b9437c60297e8f859ed59ea05f45fe7462c2cc7: no such file or directory\n: exit status 1" id=e9fed7097d0b2493e8483a08e43e42d19145bd881e064b181ec11d8a7831e7d3 namespace=k8s.io
Jul 21 03:24:28 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:28.685861940+01:00" level=error msg="failed to reload cni configuration after receiving fs change event(\"/var/snap/microk8s/2343/args/cni-network/10-calico.conflist\": REMOVE)" error="cni config load failed: no network config found in /var/snap/microk8s/2343/args/cni-network: cni plugin not initialized: failed to load cni config"
Jul 21 03:24:33 pi-k8s-01 microk8s.daemon-containerd[29921]: time="2021-07-21T03:24:33.723080957+01:00" level=error msg="collecting metrics for 053fcd20e91ca8cc84f8b3dbfe64cb59d09b77d13efbd16efe35c277156d3399" error="cgroups: cgroup deleted: unknown"
...
Jul 21 03:32:18 pi-k8s-01 microk8s.daemon-kubelite[7740]: E0721 03:32:18.774730    7740 status.go:71] apiserver received an error that is not an metav1.Status: &url.Error{Op:"Get", URL:"https://pi-k8s-01:10250/containerLogs/kube-system/calico-node-nzjzw/upgrade-ipam", Err:(*net.OpError)(0x40013f3db0)}: Get "https://pi-k8s-01:10250/containerLogs/kube-system/calico-node-nzjzw/upgrade-ipam": dial tcp 127.0.0.1:10250: connect: connection refused

Most of these errors are repeated constantly, the "exit status 1" seems to suggest containerd is restarting but, there doesn't seem to be any super obvious indication of a fresh start in the logs.

I hope some part of this is useful, I'd hoped for it to be a quick morning setup but seem to have bitten off more than I can chew ^_^

ryankurte avatar Jul 21 '21 02:07 ryankurte

Hi @ryankurte,

I see that the CNI (calico-node) is not healthy, it has restarted 39 times in 101 minutes. As the CNI is not healthy there is no networking for the Kubernetes pods so it is expected for the rest of the services to not work.

This machine has 2GB of RAM and you have added 100MB of swap. With this amount of swap is as if there is none.

Here is a suggestion: microk8s disable ha-cluster , this will switch you to a setup with etcd instead of dqlite and flannel instead of calico.

ktsakalozos avatar Jul 21 '21 09:07 ktsakalozos

thanks for the response! after running docker it didn't even occur to me that the runtime would need > 2GB of ram, is this too low to be worthwhile? (i actually didn't intend the swap to be there at all, the rpi seems to have automatically done it on file system expand, i may have missed recommendations about this?).

after disabling ha-cluster it's idling around ~500MB memory used (on a base of ~200MB without it running) which seems like it should be workable, and after microk8s enable dashboard it goes up to 600MB...

microk8s status however still reports:

microk8s is not running. Use microk8s inspect for a deeper inspection.

and the services don't seem to be running (or, ready, anyway).

$ microk8s kubectl get all --all-namespaces
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
kube-system   pod/metrics-server-698f47cc84-nzwks   0/1     Running   6          13m

NAMESPACE     NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
default       service/kubernetes                  ClusterIP   10.152.183.1     <none>        443/TCP    16m
kube-system   service/dashboard-metrics-scraper   ClusterIP   10.152.183.20    <none>        8000/TCP   13m
kube-system   service/kubernetes-dashboard        ClusterIP   10.152.183.200   <none>        443/TCP    13m
kube-system   service/metrics-server              ClusterIP   10.152.183.169   <none>        443/TCP    13m

while microk8s inspect seems to suggest everything is okay:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

inspection-report-20210722_021452.tar.gz

ram use is bouncing around a lot so i guess something is still failing / restarting, and it sorta looks like systemd is stopping and starting microk8s.daemon-containerd every few seconds or so:

Jul 22 02:22:09 pi-k8s-01 systemd[1]: Started Service for snap application microk8s.daemon-containerd.
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748621182+01:00" level=info msg="Start event monitor"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748726828+01:00" level=info msg="Start snapshots syncer"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748766124+01:00" level=info msg="Start cni network conf syncer"
Jul 22 02:22:09 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:09.748795271+01:00" level=info msg="Start streaming server"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.330933873+01:00" level=info msg="Stop CRI service"
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Stopping Service for snap application microk8s.daemon-containerd...
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345294578+01:00" level=info msg="Stop CRI service"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345431743+01:00" level=info msg="Event monitor stopped"
Jul 22 02:22:25 pi-k8s-01 microk8s.daemon-containerd[30603]: time="2021-07-22T02:22:25.345469353+01:00" level=info msg="Stream server stopped"
Jul 22 02:22:25 pi-k8s-01 systemd[1]: snap.microk8s.daemon-containerd.service: Succeeded.
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Jul 22 02:22:25 pi-k8s-01 systemd[1]: Starting Service for snap application microk8s.daemon-containerd...

dmesg also shows the virtual network adaptor bouncing around but, not much else:

[84269.623924] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84269.625245] cni0: port 1(vetha72a21b7) entered blocking state
[84269.625268] cni0: port 1(vetha72a21b7) entered disabled state
[84269.625561] device vetha72a21b7 entered promiscuous mode
[84269.625662] cni0: port 1(vetha72a21b7) entered blocking state
[84269.625668] cni0: port 1(vetha72a21b7) entered forwarding state
[84320.885230] cni0: port 1(vetha72a21b7) entered disabled state

systemd reports that etcd is up, flanneld is failing with:

failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again

so i'm just looking into that now, fingers crossed ^_^

expanded log (via sudo journalctl -u snap.microk8s.daemon-flanneld.service --all -n 500):

Here
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.516644   14053 main.go:514] Determining IP address of default interface
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.517697   14053 main.go:527] Using interface with name eth0 and address 192.168.1.54
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.517760   14053 main.go:544] Defaulting external address to interface address (192.168.1.54)
Jul 22 02:31:33 pi-k8s-01 flanneld[14053]: warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.519699   14053 main.go:244] Created subnet manager: Etcd Local Manager with Previous Subnet: 10.1.43.0/24
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.519734   14053 main.go:247] Installing signal handlers
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.599360   14053 main.go:386] Found network config - Backend type: vxlan
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.599538   14053 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.607213   14053 local_manager.go:147] Found lease (10.1.43.0/24) for current IP (192.168.1.54), reusing
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: E0722 02:31:33.614657   14053 main.go:289] Error registering network: failed to configure interface flannel.1: failed to ensure addr
Jul 22 02:31:33 pi-k8s-01 microk8s.daemon-flanneld[14053]: I0722 02:31:33.615088   14053 main.go:366] Stopping shutdownHandler...
Jul 22 02:31:33 pi-k8s-01 systemd[1]: snap.microk8s.daemon-flanneld.service: Main process exited, code=exited, status=1/FAILURE
Jul 22 02:31:33 pi-k8s-01 systemd[1]: snap.microk8s.daemon-flanneld.service: Failed with result 'exit-code'.

ryankurte avatar Jul 22 '21 01:07 ryankurte

I've been happily running K8s on a cluster of RPi 2s and 3s for years. But as of (roughly) 1.17 it seems that Kubernetes just can't run on these machines anymore. I have similar results with microk8s and k3s. 1.17 kind of runs, but the master just dies after a while, once you've deployed one or more actual apps (like cert-manager and ArgoCD). With 1.21 I can't even get to k get nodes, it never really start up fully, with either k3s or microk8s.

Guess I'm going to have to find something else to do with my RPis, I don't want to upgrade to RPi 4s as they run too hot, I have the cluster in my office and I don't want a rack of fans running next to me all day.

teq0 avatar Aug 22 '21 03:08 teq0

@teq0 i've subsequently got k3s via k3sup running on a cluster of Pi 3s without any issue (though the leader is a Pi 4) on raspbian aarch64, not heavily loaded but it seems to be running alright.

ryankurte avatar Aug 22 '21 22:08 ryankurte

I was about to try that option myself. Given that microk8s is multi-master (if I'm reading the docs correctly) I don't think it will work with a mix of 3s and 4s, but single-master k3s should probably work.

Update: RPi4 as master and RPi3s as agents seems to work fine with k3s. When I get time I might try the same with microk8s.

teq0 avatar Aug 23 '21 01:08 teq0

Having similar issues as well. I'll switch to k3s to see if that works better.

tdrz avatar Sep 22 '21 20:09 tdrz

@copiltembel @teq0 if you do not want to have HA (multi-master) you can microk8s disable ha-cluster on your nodes before joining them. In this setup only the first node will run the control plane.

ktsakalozos avatar Sep 23 '21 08:09 ktsakalozos

@ktsakalozos I've disabled ha-cluster and will give it another go.

I have a mixed setup with two nodes running on rpi4 (arm64) and a single amd64 machine. From what I read HA was enabled as soon as there were 3 nodes in the cluster. But my amd64 was mostly off, maybe that caused microk8s to fail.

tdrz avatar Sep 23 '21 11:09 tdrz

I have spent 2 days to figure out why my Raspberry Pi 4 - 4Gb memory is refusing connection. I have put exactly same commands on x64 and aarch64. Tried almost everything what i could found on the internet. After executing microk8s disable ha-cluster i was able to run microk8s.status <= and it informed me that it is lunched and after "microk8s.kubectl get cs" I have got healthy status of components. It is strange that i had no such problems on x64 :octocat:

bohatermateusz avatar Sep 20 '22 18:09 bohatermateusz

So next day i had to reinstall again, because it started again showing "connection refuse" - so problem still exists

bohatermateusz avatar Sep 21 '22 09:09 bohatermateusz

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 17 '23 10:08 stale[bot]