kubespray
kubespray copied to clipboard
kubespray doesn't support two architectures in one cluster
Environment:
-
Virutal Machines in Oracle Cloud
-
node1,2:
Linux 5.8.0-1037-oracle aarch64
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
- node3,4:
Linux 5.11.0-1016-oracle x86_64
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
-
Version of Ansible : ansible 2.10.11
-
Version of Python : python version = 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0]
Kubespray version (commit) : 425b6741
Network plugin used: calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
): I think it is not neccessary because kubespray doesn't fail
Command used to invoke ansible: ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
Output of ansible run: I think it is not neccessary because kubespray doesn't fail
Anything else do we need to know:
I have 4 machines, two with arm64 architecture, two on amd64. Kubespray supports cluster with only one architecture and select it during runtime. Architecture is used to determine calico cni image tag:
./roles/download/defaults/main.yml:calico_cni_image_tag: "{{ calico_cni_version }}{%- if image_arch != 'amd64' -%}-{{ image_arch }}{%- endif -%}"
Calico must run on any node but if fails on amd64 nodes (I invoke ansible from arm64 node) because daemonset is trying to run arm64 image.
I created two separated DaemonSets with beta.kubernetes.io/arch
nodeSelector to workaround that problem.
Is it possible to add support to create separate DaemonSet for each architecture in kubespray?
I realized that calico ships multiarch images: https://hub.docker.com/r/calico/pod2daemon-flexvol/tags?page=1&ordering=last_updated that can be used in DaemonSet.
I think the problem is that calico only ships multiarch images to docker hub, not quay.io: https://github.com/projectcalico/calico/issues/4227
And kubespray only downloads images from quay.io, not docker hub: https://github.com/kubernetes-sigs/kubespray/blob/c7e17688b96f4cb6268b41e4c6a22ebf52e22dec/roles/download/defaults/main.yml#L466
I might be able to just hardcode docker.io
in calico_<component>_image_repo
variables, but I haven't tried.
Other plugins I tested:
- kube-router worked out of box without tweaking.
- looks like cilium started publishing multiarch images to both quay.io and docker hub since v1.10.0. kubespray's current default version on master branch is 1.9.10. I bumped that to v1.10.4 and it worked fine.
Hardcoding docker.io in the calico_
I asked in the slack channel what views would be on including the detected architecture used to determine the image in the template as an affinity requirement within the deployment template also so they'll only be scheduled somewhere they'll function but didn't get a reply as it's probably a bit of a hacky solution.
I think the problem is that calico only ships multiarch images to docker hub, not quay.io: projectcalico/calico#4227
And kubespray only downloads images from quay.io, not docker hub:
https://github.com/kubernetes-sigs/kubespray/blob/c7e17688b96f4cb6268b41e4c6a22ebf52e22dec/roles/download/defaults/main.yml#L466
I might be able to just hardcode
docker.io
incalico_<component>_image_repo
variables, but I haven't tried.Other plugins I tested:
- kube-router worked out of box without tweaking.
- looks like cilium started publishing multiarch images to both quay.io and docker hub since v1.10.0. kubespray's current default version on master branch is 1.9.10. I bumped that to v1.10.4 and it worked fine.
Correct me if I am wrong, But I do see multi-arch images on Quay.io for Calico CNI

I believe the problem lies in not having a multi-arch image but Kubespray deploying not so correct images to x86 nodes.
I am having the same problem.
In my test environment, I did confirm that a arm64 container was deployed to a x64 worker node.
root@study-k8s-worker01:/var/log/containers# crictl inspect 276bd130af02d | less
{
"status": {
"id": "2b43dcd28ea76573ff065c9327c1a57877d7342c4e42dd2ec1770fe57a60e01e",
"metadata": {
"attempt": 17,
"name": "upgrade-ipam"
},
"state": "CONTAINER_EXITED",
"createdAt": "2021-12-02T07:29:43.056868981Z",
"startedAt": "2021-12-02T07:29:43.40988203Z",
"finishedAt": "2021-12-02T07:29:43.407799647Z",
"exitCode": 1,
"image": {
"annotations": {},
"image": "quay.io/calico/cni:v3.19.2-arm64"
},
"imageRef": "quay.io/calico/cni@sha256:975faed475765950b2401ab009a5e8f305a84e02adbbecd77d7dd1fec2254647",
"reason": "Error",
"message": "",
"labels": {
"io.kubernetes.container.name": "upgrade-ipam",
"io.kubernetes.pod.name": "calico-node-7dfvf",
"io.kubernetes.pod.namespace": "kube-system",
"io.kubernetes.pod.uid": "024c10a8-5077-4d83-a0fc-62decc57f18c"
},
And I agree with @andybrook that the architecture should be detected and be reflected on all playbooks.
Have tested with 2.17 and can confirm that I still can't use quay with a 3 master setup (1st and 2nd masters are amd64 and the third is arm64) without the following 5 lines of config in place to resolve the issue.
calico_node_image_repo: "{{ docker_image_repo }}/calico/node" #arch problems calico_cni_image_repo: "{{ docker_image_repo }}/calico/cni" #arch problems calico_flexvol_image_repo: "{{ docker_image_repo }}/calico/pod2daemon-flexvol" #arch problems calico_policy_image_repo: "{{ docker_image_repo }}/calico/kube-controllers" #arch problems calico_typha_image_repo: "{{ docker_image_repo }}/calico/typha" #arch problems
I've had to add in the 3rd line (flexvol) today as that image is new with this release.
Issues also persist with dns-autoscaler and metrics-server persist when they happen to run on an arch that doesn't match the arch of the 1st master listed.
Thank you @andybrook for the ansible variables for doing this
Looks like nodeselectors on the dns-autoscaler and metrics-server deployments will resolve remaining issues. I'll test this in the next week and report my own findings.
It seems like it's only pulling calico of this particular version from quay that's the issue now because they aren't pushing multi-arch containers to quay.io, only to docker.io. It's not clear when that will change, but there's an issue raised here - https://github.com/projectcalico/calico/issues/4692
For example, in my setup master-1 is amd64 and master-3 is arm64
The same images are pulled from quay and docker (amd64)...
root@k8s-master-1:~# crictl pull quay.io/calico/node:v3.19.3
Image is up to date for sha256:5ee77fcf72b487588a8959579808a2d0c89bc09118eb53f83c2ef0430183ad45
root@k8s-master-1:~# crictl pull docker.io/calico/node:v3.19.3
Image is up to date for sha256:5ee77fcf72b487588a8959579808a2d0c89bc09118eb53f83c2ef0430183ad45
Yet different images are pulled from quay and docker (arm64)...
root@k8s-master-3:~# crictl pull quay.io/calico/node:v3.19.3
Image is up to date for sha256:5ee77fcf72b487588a8959579808a2d0c89bc09118eb53f83c2ef0430183ad45
root@k8s-master-3:~# crictl pull docker.io/calico/node:v3.19.3
Image is up to date for sha256:e89834e5752c7d7a772f6a92b5afd4964d2e453094ff68063790b14da7ede51b
When the arch is appended to the tag it shows that the arm images was pulled from docker.
root@k8s-master-3:~# crictl pull quay.io/calico/node:v3.19.3-arm64
Image is up to date for sha256:e89834e5752c7d7a772f6a92b5afd4964d2e453094ff68063790b14da7ede51b
Note, contrary to the linked calico github issue as reported this is not solely a problem for the CNI image, and testing above is with 3.19.3 (rather than 3.19.2) since it was not already present on my setup.
In an update to my earlier comment, both dns-autoscaler (coredns/coredns below) and metrics-server seem to fetch the right arch automatically from google (k8s.gcr.io) now as evidenced by different image IDs listed on the two masters below, and tested with coredns/coredns in my setup (I didn't tst the metrics-server since it happened to land on the amd64 node by chance)
root@k8s-master-1:~# crictl img
IMAGE TAG IMAGE ID SIZE
docker.io/calico/cni v3.19.2 05bf027c9836a 48.3MB
docker.io/calico/kube-controllers v3.19.2 779aa7e4e93c4 25MB
docker.io/calico/node v3.19.2 7aa1277761b51 59.3MB
docker.io/calico/pod2daemon-flexvol v3.19.2 6a1186da14d91 9.36MB
docker.io/grafana/promtail 2.1.0 c8a24f224215e 63.8MB
k8s.gcr.io/addon-resizer 1.8.11 b7db21b30ad90 9.35MB
k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB
k8s.gcr.io/cpa/cluster-proportional-autoscaler-amd64 1.8.3 078b6f04135ff 15.2MB
k8s.gcr.io/dns/k8s-dns-node-cache 1.17.1 21fc69048bd5d 57MB
k8s.gcr.io/kube-apiserver v1.21.6 f6f0f372360b3 30.4MB
k8s.gcr.io/kube-controller-manager v1.21.6 90050ec9b1301 29.4MB
k8s.gcr.io/kube-proxy v1.21.6 01d07d3b4d18a 35.9MB
k8s.gcr.io/kube-scheduler v1.21.6 c51494bd8791e 14.6MB
k8s.gcr.io/metrics-server/metrics-server v0.5.0 1c655933b9c56 25.8MB
k8s.gcr.io/pause 3.3 0184c1613d929 299kB
k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB
quay.io/metallb/speaker v0.10.2 425ebe418b9b9 39.3MB
root@k8s-master-3:~# crictl img
IMAGE TAG IMAGE ID SIZE
docker.io/calico/cni v3.19.2-arm64 266411e1801c0 43.8MB
docker.io/calico/cni v3.19.2 266411e1801c0 43.8MB
docker.io/calico/kube-controllers v3.19.2-arm64 1b4ac43050989 23.3MB
docker.io/calico/node v3.19.2 3f22b504feed4 45MB
docker.io/calico/node v3.19.2-arm64 3f22b504feed4 45MB
docker.io/calico/pod2daemon-flexvol v3.19.2 be3c64e9532e3 4.73MB
docker.io/grafana/promtail 2.1.0 1c661c5eb8130 60.4MB
k8s.gcr.io/addon-resizer 1.8.11 b7db21b30ad90 9.35MB
k8s.gcr.io/coredns/coredns v1.8.0 1a1f05a2cd7c2 11.6MB
k8s.gcr.io/cpa/cluster-proportional-autoscaler-arm64 1.8.3 3b085bb9a41e1 14.2MB
k8s.gcr.io/dns/k8s-dns-node-cache 1.17.1 069cfe96da7d5 56.9MB
k8s.gcr.io/kube-apiserver v1.21.6 53826590cdd73 27.7MB
k8s.gcr.io/kube-controller-manager v1.21.6 6c072299f0050 26.7MB
k8s.gcr.io/kube-proxy v1.21.6 731123a07526c 34.3MB
k8s.gcr.io/kube-scheduler v1.21.6 f13b4e164b917 13.1MB
k8s.gcr.io/metrics-server/metrics-server v0.5.0 ee11f1e38ac66 24.2MB
k8s.gcr.io/pause 3.3 3d18732f8686c 251kB
k8s.gcr.io/pause 3.4.1 d055819ed991a 253kB
quay.io/metallb/speaker v0.10.2 f0dbaf4777116 36.9MB
This is an example line of config from roles/download/defaults/main.yml
calico_node_image_tag: "{{ calico_version }}{%- if image_arch != 'amd64' -%}-{{ image_arch }}{%- endif -%}"
I am a bit confused as to why the arch appended to the tag in that line above doesn't work when it's pulled from quay, there's been suggestion above that this is due to being invoked on an amd64 instance. I can confirm that I am invoking this from an amd64 instance too.
It does seem a little odd that on the arm64 node above (k8s-master-3) I have both images with the arch appended to the tag, and without, but they both have the ID of right arch on docker. I wonder if that's related.
TL;DR There looks to be a bug in the fetching, probably related to where ansible is invoked, but I imagine that once the calico project begin pushing multi arch images to quay this issue will be resolved, and the extra logic appending the arch to the tag can be removed. But until then the best option here is if you have a multi arch cluster to use docker.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I believe this issue still exists. I track all the hacks I need to do in this file.
/remove-lifecycle stale
I believe this issue still exists. I track all the hacks I need to do in this file.
Yeah I'm having some REALLY weird issues with checksum failures, where it looks like it is expecting the ARM64 checksum to match on the AMD64 binary or something like that.
TASK [container-engine/runc : download_file | Validate mirrors] *******************************************
ok: [node1 -> node1] => (item=https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.amd64)
ok: [node2 -> node2] => (item=https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.arm64)
ok: [node3 -> node3] => (item=https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.arm64)
Sunday 27 March 2022 03:58:51 +0000 (0:00:02.844) 0:01:21.630 **********
TASK [container-engine/runc : download_file | Get the list of working mirrors] ****************************
ok: [node1 -> node1]
ok: [node2 -> node2]
ok: [node3 -> node3]
Sunday 27 March 2022 03:58:52 +0000 (0:00:00.140) 0:01:21.771 **********
TASK [container-engine/runc : download_file | Download item] **********************************************
ok: [node2 -> node2]
ok: [node3 -> node3]
FAILED - RETRYING: download_file | Download item (4 retries left).
FAILED - RETRYING: download_file | Download item (3 retries left).
FAILED - RETRYING: download_file | Download item (2 retries left).
FAILED - RETRYING: download_file | Download item (1 retries left).
fatal: [node1 -> node1]: FAILED! => {"attempts": 4, "changed": true, "checksum_dest": null, "checksum_src": "04a40db8c47a378d3a5ed551c2f7b8e2ab17fd15", "dest": "/tmp/releases/runc", "elapsed": 0, "msg": "The checksum for /tmp/releases/runc did not match ab1c67fbcbdddbe481e48a55cf0ef9a86b38b166b5079e0010737fd87d7454bb; it was 9ec8e68feabc4e7083a4cfa45ebe4d529467391e0b03ee7de7ddda5770b05e68.", "src": "/root/.ansible/tmp/ansible-moduletmp-1648353557.6267958-azuq6rg7/tmp3pmjrqqp", "url": "https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.arm64"}
You can see up top node1 is an amd64 node, but it's trying the arm64 checksum against it and complaining because it's returning the correct checksum for the amd64 binary :-/
Hopefully your fixes work, I'll check them out now!
Edit: I disabled the checksum submission to get_url to try to get around it, but it turns out something is even more miswired than I thought because now it's actually downloading arm64 binaries and installing them on the amd64 host, resulting in a crash at "can't start containerd, check journalctl xe" which says "exec format error" ... I checked the crictl it installed and it's the exact same thing.
@hyacin75 At first glance, I suspect it was related to #8474. Specifically:
https://github.com/kubernetes-sigs/kubespray/blob/c6e5314fab3ee2e05590b69f578a4fb1ae1903e5/roles/download/tasks/download_file.yml#L50-L52
Which doesn't seem to take multi-arch into consideration and just picks one of verified mirror URLs randomly for all downloads.
However, I am not a maintainer and I can't verify (I only have one multi-arch cluster and it is currently pining kubespray v2.18.0 release).
the issue i have is when need start downliad pods:
TASK [download : debug] ******************************************************************************************************************************************************************************************* ok: [master] => { "msg": "Pull k8s.gcr.io/pause:3.3 required is: True" } ok: [pi1] => { "msg": "Pull k8s.gcr.io/pause:3.3 required is: True" } ok: [pi2] => { "msg": "Pull k8s.gcr.io/pause:3.3 required is: True" } ok: [pi3] => { "msg": "Pull k8s.gcr.io/pause:3.3 required is: True" } ok: [hellbox] => { "msg": "Pull k8s.gcr.io/pause:3.3 required is: True" } Friday 13 May 2022 04:52:14 -0300 (0:00:00.061) 0:04:21.781 ************ Friday 13 May 2022 04:52:14 -0300 (0:00:00.056) 0:04:21.837 ************ Friday 13 May 2022 04:52:15 -0300 (0:00:00.057) 0:04:21.895 ************ Friday 13 May 2022 04:52:15 -0300 (0:00:00.059) 0:04:21.954 ************ FAILED - RETRYING: download_container | Download image if required (4 retries left).
TASK [download_container | Download image if required] ************************************************************************************************************************************************************ changed: [pi1 -> pi1] changed: [pi2 -> pi2] changed: [pi3 -> pi3] changed: [master -> master] FAILED - RETRYING: download_container | Download image if required (3 retries left). FAILED - RETRYING: download_container | Download image if required (2 retries left). FAILED - RETRYING: download_container | Download image if required (1 retries left). fatal: [hellbox -> hellbox]: FAILED! => {"attempts": 4, "changed": true, "cmd": ["/usr/local/bin/nerdctl", "-n", "k8s.io", "pull", "--quiet", "k8s.gcr.io/pause:3.3"], "delta": "0:00:00.015483", "end": "2022-05-13 07:52:39.950763", "msg": "non-zero return code", "rc": 1, "start": "2022-05-13 07:52:39.935280", "stderr": "time="2022-05-13T07:52:39Z" level=fatal msg="cannot access containerd socket \"/var/run/containerd/containerd.sock\": no such file or directory"", "stderr_lines": ["time="2022-05-13T07:52:39Z" level=fatal msg="cannot access containerd socket \"/var/run/containerd/containerd.sock\": no such file or directory""], "stdout": "", "stdout_lines": []}
hellbox is AMD64 and rest ARM64, Install kubelete, kubeadm, runc, containerd, etc all well in correct arch in each node, but when pull the images, dont pull the correct ARCH for each node.
Is there any short-term "hack" or way to go around the issue? I have at least 3 Apple M1 Pros + a couple of PCs that I want to create a bigger cluster out of them. I can do this by installing k8s without kubespray, but it would take much more time for me to setup everything. So any help would be appreciated :-) @yuha0
Okay, in roles/download/tasks/download_file.yml
and under download_file | Download item
task, changing checksum to "{{ omit }}"
ignores the checksums which allows to continue the setup process.
However, another problem arise:
fatal: [node1]: FAILED! => {"changed": false, "msg": "Unable to start service containerd: Job for containerd.service failed because the control process exited with error code.\nSee \"systemctl status containerd.service\" and \"journalctl -xeu containerd.service\" for details.\n"}
I hope I'll be able to fix this soon ...
i resolve IT, first installing, Nodes AMD and then Run ansible for ADD ARM nodes. and that place rigth the containers and dont have EXEC error.
I finally got some time to upgrade my multi-arch cluster. Since what broke my installation was #8474, I just reverted the change: https://github.com/yuha0/kubespray/commit/472c8f7f260ed185176aa3fa426fb3803f24a6b2
After that I was able to finish the upgrade successfully.
Now my cluster is at k8s 1.23.7
with kubespray release 2.19.1
. Not sure if calico or other CNIs are still having issues or not, but I use cilium and in kubespray 2.19.1
it has been bumped to a version with proper multi-arch support. As a result I no longer have any issues and plan to just keep upgrading to later kubespray releases.
Have same issue even with runc install
fatal: [carbon02]: FAILED! => {"attempts": 4, "changed": true, "checksum_dest": null, "checksum_src": "8fd8b73c8101564a441346948fb9d9b57b280540", "dest": "/tmp/releases/runc", "elapsed": 0, "msg": "The checksum for /tmp/releases/runc did not match db772be63147a4e747b4fe286c7c16a2edc4a8458bd3092ea46aaee77750e8ce; it was dbb71e737eaef454a406ce21fd021bd8f1b35afb7635016745992bbd7c17a223.", "src": "/root/.ansible/tmp/ansible-moduletmp-1677944131.377251-79lzlymf/tmpk7jpx6m_", "url": "https://github.com/opencontainers/runc/releases/download/v1.1.4/runc.arm64"}
fatal: [carbon03]: FAILED! => {"attempts": 4, "changed": true, "checksum_dest": null, "checksum_src": "8fd8b73c8101564a441346948fb9d9b57b280540", "dest": "/tmp/releases/runc", "elapsed": 0, "msg": "The checksum for /tmp/releases/runc did not match db772be63147a4e747b4fe286c7c16a2edc4a8458bd3092ea46aaee77750e8ce; it was dbb71e737eaef454a406ce21fd021bd8f1b35afb7635016745992bbd7c17a223.", "src": "/root/.ansible/tmp/ansible-moduletmp-1677944131.5116482-848q31g3/tmpr5h36hze", "url": "https://github.com/opencontainers/runc/releases/download/v1.1.4/runc.arm64"}
fatal: [carbon01]: FAILED! => {"attempts": 4, "changed": true, "checksum_dest": null, "checksum_src": "8fd8b73c8101564a441346948fb9d9b57b280540", "dest": "/tmp/releases/runc", "elapsed": 0, "msg": "The checksum for /tmp/releases/runc did not match db772be63147a4e747b4fe286c7c16a2edc4a8458bd3092ea46aaee77750e8ce; it was dbb71e737eaef454a406ce21fd021bd8f1b35afb7635016745992bbd7c17a223.", "src": "/root/.ansible/tmp/ansible-moduletmp-1677944131.7942212-1zyt1t3y/tmp3k7hs0we", "url": "https://github.com/opencontainers/runc/releases/download/v1.1.4/runc.arm64"}
Carbon01-03 are amd64 (control plane) Hydrogen01-12 are arm64 (nodes)
@iaacautomation I had the same issue. What worked for me (and I cant explain why) is to deploy only the amd64 (control plane), then reset the cluster, then re-deploy with both amd64 control plan nodes and arm64 worker nodes...
I used a fork which has multi-arch support and it worked just finished. going to check everything
https://github.com/yuha0/kubespray
I have the same issue here, It only appears when I have nodes with different architectures.
As a workaround, I modified the line https://github.com/kubernetes-sigs/kubespray/blob/2cf23e3104f9b8b20ca1aefd36e3e89be26fd090/roles/download/tasks/download_file.yml#L87
from this:
url: "{{ valid_mirror_urls | random }}"
to this
url: "{{ download.url }}"
I think the mirror urls are used to improve the performance of the installation, and I didn't mind waiting a bit more and skip the usage of mirrors.
We just removed the mirrors support, does the problem still exist ? /triage needs-information
@VannTen If you meant 6f419aa18ecaf94807968914728de416e021bd86, I have been patching every single kubespray release in my fork with a similar change for 2 years and it has been working very well for me, so I believe the removal would have fixed it.
Looks like the commit was not cut into the 2.24.1 release, so I couldn't test it for real (I only have one homelab cluster and that's my prod environment 😄 ). I will keep an eye whenever next stable release is out.
I upgraded my multiarch cluster with kubespray 2.25 successfully without any modification. Looks like the fix worked.
@tu1h Do you know whether it fixed, can we close the issue :-)
@yankay AFAIK, it was fixed by ansible-pull-80476, it has been solved since ansible-core 2.16