k0s RISC-V support

Is your feature request related to a problem? Please describe.

The RISC-V ISA is getting more traction as real hardware is starting to appear. This includes smaller SBC's which don't have that many resources (i.e. 1 CPU core and 1 GB of RAM) so k8s would be a great fit to try out kubernetes on this hardware.

Describe the solution you would like

k0s available not only for amd64 and arm64 but for riscv64 as well

Describe alternatives you've considered

Currently, I do not see any kubernetes distros supporting RISC-V. Podman/containerd are available on both Ubuntu and Debian and work just fine - I was able to pull containers from Docker Hub and run them.

Additional context

Software supporting this ISA: Debian: http://ftp.ports.debian.org/debian-ports/pool-riscv64/main/ Ubuntu: http://ports.ubuntu.com/ubuntu-ports/pool/ Alpine: https://dl-cdn.alpinelinux.org/alpine/edge/main/riscv64/ Images on Docker Hub: https://hub.docker.com/u/riscv64

Jul 14 '22 12:07 jekader

@jekader thank you for this issue. I've added it to our backlog candidates.

Additional interesting resources:

https://github.com/carlosedp/riscv-bringup/
https://carlosedp.medium.com/docker-containers-on-risc-v-architecture-5bc45725624b
https://riscv.org/wp-content/uploads/2019/12/12.10-14.50c-The-RISC-V-Journey-Through-Containers-to-the-Cloud.pdf

Jul 15 '22 12:07 trawler

Does the other upstream components support riscv? Like etcd, kine, kube-router, calico?

Does github actions support riscv64 or do we need to cross compile? If we need to cross compile, how do we run unit and integration tests?

Jul 18 '22 12:07 ncopa

Is there precedent for having k0s support an arch for a worker, but not a controller?

Project	Supported?	Relevant Links
etcd	not supported	https://github.com/etcd-io/etcd/issues/14522 https://github.com/etcd-io/etcd/pull/14517
kine	?	No issue or PR
kube-router	merged, not yet released	https://github.com/cloudnativelabs/kube-router/pull/1525
calico	not supported	no issue or PR
containerd	supported	since 1.6.7/2022-08-04
runc	supported	since 1.1.8/2023-07-19

There are probably others that you didn't mention. I'll try to look into them as I figure out which ones.

Aug 29 '23 08:08 iggy

Is there precedent for having k0s support an arch for a worker, but not a controller?

This is the case for Windows. For Windows, k0s only ships kubelet and kube-proxy, relying on an external CRI and some manual shenanigans for the Calico setup. Although there's currently some ongoing work to bundle containerd also for Windows and to streamline the Calico support.

What's the status of containerd for RISC-V? I think that'd be very important for a RISC-V based k0s worker.

Aug 29 '23 11:08 twz123

Updated the table above. Both containerd and runc have supported riscv64 for multiple releases.

Aug 29 '23 15:08 iggy

I guess one of the main challenges would be to have a CI end-to-end test case for RISC-V. The current arm64 & armv7 are "easy" as we have dedicated runners for both of those archs. I don't think we can do that for RISC-V and thus we'd need to figure out something else. What that something else could be, I've got no idea :D

Sep 01 '23 08:09 jnummelin

I was able to build something: https://github.com/twz123/k0s/releases/tag/v1.28.2%2Bk0sriscv64.0

Feel free to give it a shot. I didn't do any deeper verifications on this despite executing the integration test suite. Basic clusters should work. Maybe somebody wants to do some further tests and share any results?

Oct 04 '23 14:10 twz123

Is there any easy way to test this with a k0sctl cluster? I've managed to get past getting this version uploaded and then get the below error. I can't easily change the version because all the other nodes are running the "real" version 1.28.2.

uploaded k0s binary version is v1.28.2+k0sriscv64.0 not v1.28.2+k0s.0

Oct 05 '23 00:10 iggy

@iggy There's also an amd64 binary available for download. If it's not a production cluster, and if the other nodes are amd64, you can deploy the RISC-V version on all of them, which should get you past the k0sctl version check. Otherwise, I don't know of any other way around it. In that case, you'd have to join the RISC-V node manually by creating a join token and adding the worker to the cluster.

On amd64, the RISC-V enabled version should work just as well as the vanilla 1.28.2+k0s.0 release, so I don't see any blockers to using it for the whole cluster. Please also note that if you're not running the RISC-V enabled version on the controller nodes, you need to add the custom OCI images to your k0s config. Otherwise all the necessary pods won't be able to run on RISC-V. You can run ./k0s-v1.28.2+k0sriscv64.0-riscv64 config create --include-images and grab the images snippet from there:

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
spec:
  images:
    coredns:
      image: quay.io/twz123/coredns
      version: 1.11.1-1@sha256:ef304af35da98ff9f1af445b103b3fd73221ffaddb802be152cd0c488ec19699
    konnectivity:
      image: quay.io/twz123/apiserver-network-proxy-agent
      version: v0.1.4-1@sha256:66a0ce4a1b7f98ea74510d30c1e96d80846c9a233b9c6eb30143d32209e127a3
    kubeproxy:
      image: quay.io/twz123/kube-proxy
      version: v1.28.2-1@sha256:77dbd9bb0b9ee748b4d39f0e998076cd1269ae097b482fd58f54dea56906efe1
    kuberouter:
      cni:
        image: quay.io/twz123/kube-router
        version: v1.6.0-iptables1.8.9-1@sha256:7ddbda29726da778945274ede6ff530351c6075695779777486d6ecc5ce8ea58
      cniInstaller:
        image: quay.io/twz123/cni-node
        version: 1.3.0-k0s.1@sha256:c08c83a7388bd3d92637846603d7065871b2c7e59f4a0de1e701c1045a1215ea
    metricsserver:
      image: quay.io/twz123/metrics-server
      version: v0.6.4-1@sha256:ee0d5a55b6724d4a955aaa5357d655092f4e4d1458f92e1b5d79d9fd127073d0
    pause:
      image: quay.io/twz123/pause
      version: 3.9-1@sha256:266cc1ad730c2a1adc10a60e7b6216ad13095ec5b0329336a557141306ec4625

Oct 05 '23 07:10 twz123

Got it. Sounds like I should just do a new cluster. My current cluster is a mix of x86_64 and aarch64 nodes. I have 2 RISC-V boards I could do a small cluster with just them.

Oct 05 '23 07:10 iggy

I'll have a look if I can compile the arm32/64 binaries as well. Then you could re-use the existing cluster.

Oct 05 '23 12:10 twz123

@iggy I've added the arm64 build. Now you can try a mixed-arch cluster.

Oct 11 '23 08:10 twz123

As a note for anyone who runs into this, when using the uploadBinary function, you have to have bash installed on the target nodes. The error message you get back from k0sctl is not super useful.

upload k0s binary: invalid path: open remote file /tmp/tmp.91elGRnKL6 for writing: command failed: failed to execute helper: command failed: client exec: command failed: write stdin: EOF

upload k0s binary: invalid path: open remote file /tmp/tmp.zUV1nqbGj7 for writing: command failed: failed to execute helper: command failed: client exec: ssh session wait: Process exited with status 127

Oct 11 '23 22:10 iggy

The pause container doesn't support riscv64. I'm trying to track down what would be involved in adding that.

Oct 11 '23 23:10 iggy

Sorry, just noticed you have a pause image at the bottom of your list above. It looks like something is trying to pull the upstream pause image. These errors repeat continually and k0sctl eventually gives up on adding the node.

time="2023-10-12 00:04:24" level=info msg="time=\"2023-10-12T00:04:24.779571495Z\" level=info msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-proxy-b8jcq,Uid:84b204d7-10c3-4a07-a03a-5df6595b3aa9,Namespace:kube-system,Attempt:0,}\"" component=containerd stream=stderr
time="2023-10-12 00:04:25" level=info msg="time=\"2023-10-12T00:04:25.403288008Z\" level=info msg=\"stop pulling image registry.k8s.io/pause:3.8: active requests=0, bytes read=2945\"" component=containerd stream=stderr
time="2023-10-12 00:04:25" level=info msg="time=\"2023-10-12T00:04:25.403829019Z\" level=error msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-proxy-b8jcq,Uid:84b204d7-10c3-4a07-a03a-5df6595b3aa9,Namespace:kube-system,Attempt:0,} failed, error\" error=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \
\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for platform in manifest: not found\"" component=containerd stream=stderr
time="2023-10-12 00:04:25" level=info msg="E1012 00:04:25.407988    2529 remote_runtime.go:193] \"RunPodSandbox from runtime service failed\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match fo
r platform in manifest: not found\"" component=kubelet stream=stderr
time="2023-10-12 00:04:25" level=info msg="E1012 00:04:25.410562    2529 kuberuntime_sandbox.go:72] \"Failed to create sandbox for pod\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for pla
tform in manifest: not found\" pod=\"kube-system/kube-proxy-b8jcq\"" component=kubelet stream=stderr
time="2023-10-12 00:04:25" level=info msg="E1012 00:04:25.412050    2529 kuberuntime_manager.go:1166] \"CreatePodSandbox for pod failed\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for pl
atform in manifest: not found\" pod=\"kube-system/kube-proxy-b8jcq\"" component=kubelet stream=stderr
time="2023-10-12 00:04:25" level=info msg="E1012 00:04:25.414264    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"failed to \\\"CreatePodSandbox\\\" for \\\"kube-proxy-b8jcq_kube-system(84b204d7-10c3-4a07-a03a-5df6595b3aa9)\\\" with CreatePodSandboxError: \\\"Failed to create sandbox for pod \\\\\\\"kube-proxy-b8jcq_kube-system(84b204d7-10c3-4a07-a03a-5df6595
b3aa9)\\\\\\\": rpc error: code = NotFound desc = failed to get sandbox image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": failed to pull image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": failed to pull and unpack image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": no match for platform in manifest: not found\\\"\" pod=\"kube-system/kube-proxy-b8jcq\" podUID=\"84b204d7-10c3-4a07-a03
a-5df6595b3aa9\"" component=kubelet stream=stderr
time="2023-10-12 00:04:26" level=info msg="E1012 00:04:26.772136    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\" pod=\"kube-system/konnectivity-agent-txcpj\" podUID=\"7b8657b2-3d21-4724-
8a2e-14c92a8d8880\"" component=kubelet stream=stderr
time="2023-10-12 00:04:27" level=info msg="E1012 00:04:27.774628    2529 kubelet.go:2855] \"Container runtime network not ready\" networkReady=\"NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\"" component=kubelet stream=stderr
time="2023-10-12 00:04:28" level=info msg="E1012 00:04:28.772278    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\" pod=\"kube-system/konnectivity-agent-txcpj\" podUID=\"7b8657b2-3d21-4724-
8a2e-14c92a8d8880\"" component=kubelet stream=stderr
time="2023-10-12 00:04:29" level=info msg="time=\"2023-10-12T00:04:29.779123051Z\" level=info msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-router-xmbhl,Uid:898311a4-5079-427a-a3dc-dd1db5b2607c,Namespace:kube-system,Attempt:0,}\"" component=containerd stream=stderr
time="2023-10-12 00:04:30" level=info msg="time=\"2023-10-12T00:04:30.479965423Z\" level=info msg=\"stop pulling image registry.k8s.io/pause:3.8: active requests=0, bytes read=2945\"" component=containerd stream=stderr
time="2023-10-12 00:04:30" level=info msg="time=\"2023-10-12T00:04:30.480281679Z\" level=error msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-router-xmbhl,Uid:898311a4-5079-427a-a3dc-dd1db5b2607c,Namespace:kube-system,Attempt:0,} failed, error\" error=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image
\\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for platform in manifest: not found\"" component=containerd stream=stderr
time="2023-10-12 00:04:30" level=info msg="E1012 00:04:30.485103    2529 remote_runtime.go:193] \"RunPodSandbox from runtime service failed\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match fo
r platform in manifest: not found\"" component=kubelet stream=stderr
time="2023-10-12 00:04:30" level=info msg="E1012 00:04:30.485999    2529 kuberuntime_sandbox.go:72] \"Failed to create sandbox for pod\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for pla
tform in manifest: not found\" pod=\"kube-system/kube-router-xmbhl\"" component=kubelet stream=stderr
time="2023-10-12 00:04:30" level=info msg="E1012 00:04:30.486515    2529 kuberuntime_manager.go:1166] \"CreatePodSandbox for pod failed\" err=\"rpc error: code = NotFound desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": no match for pl
atform in manifest: not found\" pod=\"kube-system/kube-router-xmbhl\"" component=kubelet stream=stderr
time="2023-10-12 00:04:30" level=info msg="E1012 00:04:30.487505    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"failed to \\\"CreatePodSandbox\\\" for \\\"kube-router-xmbhl_kube-system(898311a4-5079-427a-a3dc-dd1db5b2607c)\\\" with CreatePodSandboxError: \\\"Failed to create sandbox for pod \\\\\\\"kube-router-xmbhl_kube-system(898311a4-5079-427a-a3dc-dd1db
5b2607c)\\\\\\\": rpc error: code = NotFound desc = failed to get sandbox image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": failed to pull image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": failed to pull and unpack image \\\\\\\"registry.k8s.io/pause:3.8\\\\\\\": no match for platform in manifest: not found\\\"\" pod=\"kube-system/kube-router-xmbhl\" podUID=\"898311a4-5079-427a-
a3dc-dd1db5b2607c\"" component=kubelet stream=stderr
time="2023-10-12 00:04:30" level=info msg="E1012 00:04:30.770413    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\" pod=\"kube-system/konnectivity-agent-txcpj\" podUID=\"7b8657b2-3d21-4724-
8a2e-14c92a8d8880\"" component=kubelet stream=stderr
time="2023-10-12 00:04:32" level=info msg="E1012 00:04:32.772414    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\" pod=\"kube-system/konnectivity-agent-txcpj\" podUID=\"7b8657b2-3d21-4724-
8a2e-14c92a8d8880\"" component=kubelet stream=stderr
time="2023-10-12 00:04:32" level=info msg="E1012 00:04:32.781193    2529 kubelet.go:2855] \"Container runtime network not ready\" networkReady=\"NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\"" component=kubelet stream=stderr
time="2023-10-12 00:04:34" level=info msg="E1012 00:04:34.771921    2529 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized\" pod=\"kube-system/konnectivity-agent-txcpj\" podUID=\"7b8657b2-3d21-4724-
8a2e-14c92a8d8880\"" component=kubelet stream=stderr

Oct 12 '23 00:10 iggy

More digging leads to this:

# grep pause /run/k0s/containerd-cri.toml
    sandbox_image = "registry.k8s.io/pause:3.8"

I tried changing that and /var/lib/k0s/worker-profile.yaml to point to your pause image, but they keep resetting back.

I tried to add a block to the k0sctl.yaml to specify the image and version. It continues to use the upstream for some reason. Not sure what to try next, but I'll poke at it some more tomorrow.

Oct 12 '23 06:10 iggy

This kinda sounds like the controllers are not using the right images. Are they running the RISC-V version? That version should inject all the RISC-V enabled images by default, unless something is overridden in their config. Otherwise, the k0s config snippet that I pasted in a previous comment can be used to achieve the same when running vanilla k0s controllers. Note that the image configuration is managed by the k0s controllers, not by the workers, so you definitely need to check the k0s config on the controllers (no matter on which arch they're running), not on the individual workers. The containerd pause image snippet you're referring to is explicitly managed by k0s. It should get the pause image that's configured in the k0s controller's config. You can check the currently active pause image also in the worker-config ConfigMaps. For the default worker profile, you can issue the following command:

$ sudo k0s kc -n kube-system get cm worker-config-default-1.28 -ojson | jq .data.pauseImage
"{\"image\":\"quay.io/twz123/pause\",\"version\":\"3.9-1@sha256:266cc1ad730c2a1adc10a60e7b6216ad13095ec5b0329336a557141306ec4625\"}"

If that's listing the vanilla pause image, then something is off. If the ConfigMap lists the right image, but the containerd config doesn't receive it, then there might be some problem with k0s's containerd config management. Can you confirm that a single node cluster behaves as expected, e.g. by running sudo k0s controller --single with the RISC-V build? It doesn't really matter on which architecture. That k0s version should always use the custom images by default, and also inject the containerd config snippet accordingly. Editing /var/lib/k0s/worker-profile.yaml is not going to work. This is just a cache file that's used during worker startup. As soon as the worker is able to connect to the cluster, it will re-sync that file from the worker-config ConfigMap.

BTW, you don't use dynamic configuration, do you? In that case, you should check the k0s configuration that's stored in the cluster instead of the one that's used to start the controllers.

Oct 12 '23 08:10 twz123

Spot on. I added the riscv64 node to the config, so k0sctl is trying to add it first rather than "upgrade" the existing nodes. I'll try running k0sctl without the new node in the config first and then add it back in.

Oct 12 '23 14:10 iggy

# kc get no --show-labels
NAME           STATUS                     ROLES    AGE    VERSION             LABELS
lab001-004     Ready                      <none>   116d   v1.28.2+k0s-dirty   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=lab001-004,kubernetes.io/os=linux
lab001-005     Ready,SchedulingDisabled   <none>   32d    v1.28.2+k0s-dirty   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=lab001-005,kubernetes.io/os=linux
lab001-006     Ready                      <none>   112d   v1.28.2+k0s-dirty   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=lab001-006,kubernetes.io/os=linux
lab001-007     Ready                      <none>   110d   v1.28.2+k0s-dirty   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=lab001-007,kubernetes.io/os=linux
lab001-008     Ready                      <none>   17h    v1.28.2+k0s-dirty   beta.kubernetes.io/arch=riscv64,beta.kubernetes.io/os=linux,kubernetes.io/arch=riscv64,kubernetes.io/hostname=lab001-008,kubernetes.io/os=linux
lab001-vm001   Ready                      <none>   110d   v1.28.2+k0s-dirty   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=lab001-vm001,kubernetes.io/os=linux

# kc get no lab001-008 -oyaml
...
status:
...
  nodeInfo:
    architecture: riscv64
    bootID: a0190eaa-2e58-413b-b636-2605b165183b
    containerRuntimeVersion: containerd://1.7.6
    kernelVersion: 6.5.7-2-starfive
    kubeProxyVersion: v1.28.2+k0s-dirty
    kubeletVersion: v1.28.2+k0s-dirty
    machineID: bda70c9096751d6a9612736a640f01e9
    operatingSystem: linux
    osImage: Alpine Linux edge
    systemUUID: bda70c9096751d6a9612736a640f01e9

# kc -n riscv64-test get po -owide
NAME                            READY   STATUS    RESTARTS   AGE     IP           NODE         NOMINATED NODE   READINESS GATES
riscv64-test-7f677656cc-q286s   1/1     Running   0          2m46s   10.244.5.8   lab001-008   <none>           <none>

# kc -n riscv64-test exec -it riscv64-test-7f677656cc-q286s -- /bin/sh
/ # arch
riscv64
/ #

Oct 12 '23 16:10 iggy

Uuuh, sweeet!

Oct 12 '23 17:10 twz123

Anything else you need me to test for proper functionality?

I'm guessing the next step is figuring out some way to do E2E testing. I'm going to guess qemu isn't an option. There's some hardware on the horizon that should make this doable, but currently available boards probably aren't ideal. The easiest boards to get right now are probably VisionFive 2's which are roughly equivalent to a RPi3 speed wise. Lichee Pi 4a's are relatively new, but I haven't gotten Alpine running on mine yet. There's a Lichee Pi 4a cluster board coming soon (the LM4A is a CM3 style baseboard, the Lichee Pi 4a is the LM4A + a carrier). That's 7x LM4A on a cluster board. There's also a 64 core Milk-V Pioneer shipping soonish as well.

Oct 13 '23 04:10 iggy

Thank you for your dedication to testing k0s on RISC-V. Much appreciated! Your multi-arch cluster looks awesome. :smile:

K0s has a rather comprehensive integration test suite. I've already managed to get a good portion of those tests passing on RISC-V. I haven't had the chance to thoroughly investigate the failures yet, but the ones I looked at boiled down to require some binary (helm, cri-dockerd) or OCI image that's not available for RISC-V yet. I didn't spend any efforts yet on Calico, for example. If you're dedicated enough, you could try and see if there's any "real" integration test failures that are failing due to different reasons than something has to be compiled for RISC-V.

We could take it step by step from here.

Some (tiny) patches to the k0s codebase are not yet merged. The goal would be to have make GO='' EMBEDDED_BINS_BUILDMODE=none working on RISC-V using a clean checkout.
A minor patch needs to be applied to the Kubernetes build system to add riscv64 as a supported architecture. Not sure this will make it upstream anytime soon.
Maybe QEMU is an option for testing. That would rather happen on a scheduled basis than on a per-PR basis. On the positive side of things, in contrast to Windows, I don't expect many RISC-V specific problems that aren't observable on other arches. (Okay, Windows is an OS, not an architecture, but you get my point :see_no_evil:)
Lastly, the biggest issue: The build pipeline for the OCI images provided by k0s needs to be adjusted to produce RISC-V images. That even includes the base OCI image for k0s's build process. I've built the OCI images for this test in an ad-hoc fashion to get things going. That's nothing that I'd like to upstream.

I don't see the k0s project providing any RISC-V binaries during its release process just yet. For that, Kubernetes should be buildable on RISC-V without the need for patches, and we'd probably need real hardware to test on. But: building from source should be easy.

So maybe the goal of this issue could be:

Make k0s's build process work out of the box on RISC-V
Document how k0s can be built on RISC-V
Provide k0s multi-arch OCI images also for RISC-V
Potentially add some QEMU based scheduled testing

What do you think?

BTW, I did all my testing on the LicheePi 4A. Aware of both the LicheePi Cluster and the Milk-V Pioneer. Having hardware is one thing, but I failed to get GitHub Runner working on the LiPi (Dotnet is currently in the process of adding RISC-V support as well, tried to compile something but this is massive...), so those machines wouldn't be integratable into the k0s GitHub workflows without further tricks.

Oct 13 '23 08:10 twz123

Spot on. I added the riscv64 node to the config, so k0sctl is trying to add it first rather than "upgrade" the existing nodes. I'll try running k0sctl without the new node in the config first and then add it back in.

I wonder if k0sctl has some logic around that. When doing a cluster upgrade, say from 1.27 to 1.28 and adding a worker node in the same go, then k0sctl would first add the 1.28 worker before updating the control plane? That sounds like the wrong order. Ever thought about this @kke?

The above example doesn't apply to iggy's case, since this was not really a cluster upgrade (the minor version remained the same), but could it be a good idea to update the control plane before adding nodes?

Oct 13 '23 09:10 twz123

@ncopa What would be the appropriate steps to get binutils-gold built for riscv64 on alpine edge?

Oct 13 '23 09:10 twz123

k0s k0s copied to clipboard

RISC-V support

Is your feature request related to a problem? Please describe.

Describe the solution you would like

Describe alternatives you've considered

Additional context

k0s
k0s copied to clipboard