weave icon indicating copy to clipboard operation
weave copied to clipboard

Document version compatibility (all dependencies)

Open antoinep92 opened this issue 2 years ago • 16 comments

Hi, I just spent quite some time figuring what was wrong with my kubernetes cluster. It appears latest weave is not compatible with latest CNI. Which is totally fine, but I think it should be made more apparent.

My suggestion would be to either include that information in the release notes, or to add a compatibility table in the documentation. Maybe there is one but I haven't been able to locate one.

Also the recommended installation method it to get a yaml from your website, which only depends on the kubernetes version, and not the CNI version. So there should at least be a warning about that in the install doc.

As far as I can tell, latest weave works with CNI protocol version =0.3.0 and CNI release <=0.8.1

antoinep92 avatar Feb 18 '22 13:02 antoinep92

Why do you think it is not compatible with newer versions? I have not found an issue yet. I am using latest k3s, latest weave and CNI Plugins 1.1.0.

amiga23 avatar Mar 11 '22 19:03 amiga23

Yeah I'm sorry, it indeed works with any CNI release > 0.3 (including 1.0+) but only with protocol version 0.3 which is incompatible with containerd 1.6+ which uses 1.0 protocol version, and this broke my cluster when upgrading containerd.

antoinep92 avatar Mar 12 '22 10:03 antoinep92

Uh okay thank you for the hint. I am currently at containerd 1.5.9. Do you have logs what happens with containerd 1.6?

amiga23 avatar Mar 12 '22 17:03 amiga23

Sorry, I don't have the logs anymore, but pods could not be created or deleted anymore, and complained about missing or unparsable or incompatible verions. The error itself was not very helpful and I spent quite some time figuring out the issue was due to containerd requiring a different CNI protocol version than weave.

antoinep92 avatar Mar 22 '22 20:03 antoinep92

Do you have logs what happens with containerd 1.6?

I'm guessing it's the same thing I'm hitting -

Warning  FailedCreatePodSandBox  4s (x10 over 118s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "217da23c13e0c6565689328bc8bee1e4095a44ec7a4c427b347f0a085372743d": plugin type="portmap" failed (add): failed to parse config: could not parse prevResult: could not parse prevResult: result type supports [0.3.0 0.3.1 0.4.0] but unmarshalled CNIVersion is "0.1.0"
ubuntu@on-3:~$ containerd --version
containerd github.com/containerd/containerd v1.6.1 10f428dac7cec44c864e1b830a4623af27a9fc70
ubuntu@on-3:~$

Edit: Dang it, and it's an ARM64 box, and there is no containerd 1.5.x for ARM64 :-/ ... I'm using Kubespray so I'll have to try one of the other container runtimes.

hyacin75 avatar Mar 28 '22 21:03 hyacin75

I'm hitting the same. FWICT this got introduced by https://github.com/containernetworking/cni/commit/76bf3de7f892b5adac1b20bf6fb7a1e962ad0cd1. Runtimes which include this commit don't work with CNI plugins which are missing https://github.com/containernetworking/cni/commit/27a5b994c2a55d1fceca08ec88139b61d4ad55fd (from 2017!).

The issue is that without this, weave-net makes unversioned replies like

{
    "ips": [
        {
            "version": "4",
            "address": "10.32.0.2/12",
            "gateway": "10.32.0.1"
        }
    ],
    "dns": {}
}

which the runtime now interprets as having version 0.1.0. With the referenced commit added to weave's copy of cni, it is versioned again:

{
    "cniVersion": "0.3.0",
    "ips": [
        {
            "version": "4",
            "address": "10.32.0.6/12",
            "gateway": "10.32.0.1"
        }
    ],
    "dns": {}
}

Vogtinator avatar Mar 29 '22 13:03 Vogtinator

I attempted to update the cni version to 0.6.0, which contains the needed fixes but not the API break for cmdCheck. While that built properly, the weave container fails to start due to a panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x55fdf4037fbc]

goroutine 1 [running]:
main.isLocalNodeIP({0xc000191570, 0xd})
        /home/abuild/rpmbuild/BUILD/weave-2.8.1/prog/kube-utils/main.go:82 +0xbc
main.getKubePeers({0x55fdf4bd72d0, 0xc000372dc0}, 0x0)
        /home/abuild/rpmbuild/BUILD/weave-2.8.1/prog/kube-utils/main.go:62 +0x445
main.main()
        /home/abuild/rpmbuild/BUILD/weave-2.8.1/prog/kube-utils/main.go:405 +0x87f
Failed to get peers

So the only way to fix this is to cherry-pick the fix into the vendored copy for now: https://github.com/Vogtinator/weave/commit/ef8fa923030d9b6da3ca014689871b0108486e31

With that, the cluster comes up as expected.

Vogtinator avatar Mar 30 '22 11:03 Vogtinator

Do you have logs what happens with containerd 1.6?

I'm guessing it's the same thing I'm hitting -

Warning  FailedCreatePodSandBox  4s (x10 over 118s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "217da23c13e0c6565689328bc8bee1e4095a44ec7a4c427b347f0a085372743d": plugin type="portmap" failed (add): failed to parse config: could not parse prevResult: could not parse prevResult: result type supports [0.3.0 0.3.1 0.4.0] but unmarshalled CNIVersion is "0.1.0"
ubuntu@on-3:~$ containerd --version
containerd github.com/containerd/containerd v1.6.1 10f428dac7cec44c864e1b830a4623af27a9fc70
ubuntu@on-3:~$

Edit: Dang it, and it's an ARM64 box, and there is no containerd 1.5.x for ARM64 :-/ ... I'm using Kubespray so I'll have to try one of the other container runtimes.

Hello!

I am facing the same issues on my homelab sigle-node cluster:

Warning FailedCreatePodSandBox 48s (x167 over 38m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_local-path-provisioner-566b877b9c-rh7nj_kube-system_534f4cf2-4e1f-4c8e-ae16-b04ebb4b4ba6_0(e19e036f21a730777e4f6fb77fdd3c605ca61ea9990007235749ef169fca2c39): error adding pod kube-system_local-path-provisioner-566b877b9c-rh7nj to CNI network "weave": plugin type="portmap" failed (add): failed to parse config: could not parse prevResult: could not parse prevResult: result type supports [0.3.0 0.3.1 0.4.0] but unmarshalled CNIVersion is "0.1.0"

Kubernetes and runtime versions:

$ k get nodes -o wide
NAME      STATUS   ROLES                  AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION                CONTAINER-RUNTIME
pikachu   Ready    control-plane,master   95d   v1.23.1   192.168.100.152   <none>        Red Hat Enterprise Linux 8.5 (Ootpa)   4.18.0-348.7.1.el8_5.x86_64   cri-o://1.23.0

Which solutions do we have at this moment?

mrachuta avatar Apr 10 '22 12:04 mrachuta

Updated image (with github.com/containernetworking/[email protected]) temporarily upload as below:

  • https://hub.docker.com/r/alvistack/weave-kube
  • https://hub.docker.com/r/alvistack/weave-npc

Also see https://github.com/weaveworks/weave/pull/3939

hswong3i avatar Apr 13 '22 09:04 hswong3i

This does not seem to be happening only with containerd but also with cri-o. A release (incl. aarch64) with a fix would be very appreciated.

everflux avatar Apr 20 '22 18:04 everflux

Yeah I'm sorry, it indeed works with any CNI release > 0.3 (including 1.0+) but only with protocol version 0.3 which is incompatible with containerd 1.6+ which uses 1.0 protocol version, and this broke my cluster when upgrading containerd.

containerd should work with older cni configs and plugins .. we did have a change where we did lo using 1.0.0 config and the cni loopback plugin.. we reverted that to 0.3.1 in containerd 1.6.4, containerd is built against cni v1.0.1 library but should be backwards compatible.

let's see what we can do to fix these issues..

to CNI network "weave": plugin type="portmap" failed (add): failed to parse config: could not parse prevResult: could not parse prevResult: result type supports [0.3.0 0.3.1 0.4.0] but unmarshalled CNIVersion is "0.1.0

nod new cni requires config version to be specified on setup otherwise presumes 010...

cheers!

mikebrow avatar May 10 '22 20:05 mikebrow

looks like it's not just that the config needs to be specified but the plugin also needs the setup result to have the version in it or cni will convert the result to the wrong version and flush the important parts in the result..

If correct, weave needs a fix to add config version in result.. and cni needs a fix to use/try config version if the plugin did not provide the config version in the result...

mikebrow avatar May 11 '22 13:05 mikebrow

Doesn't https://github.com/weaveworks/weave/pull/3939 solve this issue?

hswong3i avatar May 11 '22 14:05 hswong3i

I was facing the same issue when deploying a brand new cluster. The core-dns pods wouldn't start with the error message below:

  Warning  FailedCreatePodSandBox  4m5s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1557f2385ba5c7a830a441c1abcd15a0d6026c1cc50d6ecad6f21be6bf3b215c": plugin type="portmap" failed (add): failed to parse config: could not parse prevResult: could not parse prevResult: result type supports [0.3.0 0.3.1 0.4.0] but unmarshalled CNIVersion is "0.1.0"

I was using the latest version of kubeadm/kubernetes (1.24.1), containerd (1.6.4), the CNI plugins (1.1.1) and weave (installed following the documentation).

I fixed the problem thanks to this issue by downgrading containerd to 1.5.12:

  • Downloaded the correct tar file
  • Stopped containerd service
  • Untar containerd 1.5.12 over the previous install in /usr/local
  • Restart the containerd service
  • Restart the kubelet service
  • Core-dns pods started running

B-Souty avatar Jun 03 '22 20:06 B-Souty

This should be resolved by #3946 – I'm going to keep this open until I can see how it gets published as "latest", or at least see that it gets published.

kingdonb avatar Jun 09 '22 12:06 kingdonb

Let's re-purpose this issue, since I'm hearing from some folks there are other compatibility issues that may impact Weave net – eg. iptables, etc. I am using a mix of IPTables 1.8.8 and 1.8.5 in my cluster with weave net, and don't seem to be having any issues. But there might be something nuanced in here and it will be helpful to future users if we can document it.

  • https://github.com/weaveworks/weave/issues/3465#issuecomment-929816278

It seems in here the important bit of information is that iptables-legacy must be used instead of nf_tables, and this affects mainly CentOS users so far.

Edit: when we have a good list going, we can close this issue by adding it to the docs. 👍

kingdonb avatar Aug 02 '22 12:08 kingdonb