weave icon indicating copy to clipboard operation
weave copied to clipboard

Weave-net CNI can not work on containerd=1.6.4

Open fuzhibo opened this issue 2 years ago • 6 comments

What you expected to happen?

Using Weave-net CNI for kubernetes=v1.20.1

What happened?

Found weave-net CNI can not work on kubernetes=v1.20.1 and containerd.io=1.6.4, the veth of containers can not be created.

How to reproduce it?

Upgrade containerd.io to 1.6.4

Anything else we need to know?

When I downgrades containerd.io=1.5.11,everything works.

Versions:

$ weave version
weave 2.8.1
$ docker version
```console
Client: Docker Engine - Community
 Version:           20.10.16
 API version:       1.41
 Go version:        go1.17.10
 Git commit:        aa7e414
 Built:             Thu May 12 09:17:28 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.16
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.10
  Git commit:       f756502
  Built:            Thu May 12 09:15:33 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ uname -a

Linux k8s-master 4.15.0-177-generic #186-Ubuntu SMP Thu Apr 14 20:23:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:00:47Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
## Logs:

$ docker logs weave

or, if using Kubernetes:

$ kubectl logs -n kube-system weave

<!-- (If output is long, please consider a Gist.) -->
<!-- Anything interesting or unusual output by the below, potentially relevant, commands?
$ journalctl -u docker.service --no-pager
$ journalctl -u kubelet --no-pager
May 12 14:59:51 k8s-master containerd[2087]: time="2022-05-12T14:59:51.590299653+08:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:fluentd-k26wt,Uid:cbe02001-06b7-4b8c-872e-7d559b61395c,Namespace:kube-system,Attempt:0,}"

May 12 14:59:51 k8s-master kernel: [11607.056092] weave: port 2(vethwepl73cc41b) entered blocking state

May 12 14:59:51 k8s-master kernel: [11607.056096] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:51 k8s-master kernel: [11607.056207] device vethwepl73cc41b entered promiscuous mode

May 12 14:59:51 k8s-master systemd-udevd[22597]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

May 12 14:59:51 k8s-master systemd-udevd[22598]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

May 12 14:59:51 k8s-master networkd-dispatcher[1866]: WARNING:Unknown index 5167 seen, reloading interface list

May 12 14:59:51 k8s-master systemd-udevd[22598]: Could not generate persistent MAC address for vethwepl73cc41b: No such file or directory

May 12 14:59:51 k8s-master kernel: [11607.085723] eth0: renamed from vethwepg73cc41b

May 12 14:59:51 k8s-master systemd-udevd[22597]: link_config: could not get ethtool features for vethwepg73cc41b

May 12 14:59:51 k8s-master systemd-udevd[22597]: Could not set offload features of vethwepg73cc41b: No such device

May 12 14:59:51 k8s-master networkd-dispatcher[1866]: ERROR:Unknown interface index 5167 seen even after reload

May 12 14:59:51 k8s-master libvirtd[2076]: 2022-05-12 06:59:51.985+0000: 2831: error : virFileReadAll:1420 : Failed to open file '/sys/class/net/vethwepg73cc41b/operstate': No such file or directory

May 12 14:59:51 k8s-master libvirtd[2076]: 2022-05-12 06:59:51.985+0000: 2831: error : virNetDevGetLinkInfo:2530 : unable to read: /sys/class/net/vethwepg73cc41b/operstate: No such file or directory

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Link UP

May 12 14:59:52 k8s-master kernel: [11607.219944] IPv6: ADDRCONF(NETDEV_UP): vethwepl73cc41b: link is not ready

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Gained carrier

May 12 14:59:52 k8s-master kernel: [11607.226183] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

May 12 14:59:52 k8s-master kernel: [11607.226199] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

May 12 14:59:52 k8s-master kernel: [11607.226237] IPv6: ADDRCONF(NETDEV_CHANGE): vethwepl73cc41b: link becomes ready

May 12 14:59:52 k8s-master kernel: [11607.226293] weave: port 2(vethwepl73cc41b) entered blocking state

May 12 14:59:52 k8s-master kernel: [11607.226295] weave: port 2(vethwepl73cc41b) entered forwarding state

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Link DOWN

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Lost carrier

May 12 14:59:52 k8s-master kernel: [11607.299911] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:52 k8s-master kernel: [11607.303493] device vethwepl73cc41b left promiscuous mode

May 12 14:59:52 k8s-master kernel: [11607.303495] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:52 k8s-master containerd[2087]: time="2022-05-12T14:59:52.223538455+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:fluentd-k26wt,Uid:cbe02001-06b7-4b8c-872e-7d559b61395c,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\": failed to find network info for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\""

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224570    2063 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224657    2063 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224683    2063 kuberuntime_manager.go:755] createPodSandbox for pod "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224767    2063 pod_workers.go:191] Error syncing pod cbe02001-06b7-4b8c-872e-7d559b61395c ("fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)"), skipping: failed to "CreatePodSandbox" for "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)\" failed: rpc error: code = Unknown desc = failed to setup network for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\": failed to find network info for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\""

May 12 15:00:01 k8s-master kernel: [11616.961093] sh (22763): drop_caches: 3

May 12 15:00:01 k8s-master kernel: [11616.962742] sh (22760): drop_caches: 3

May 12 15:00:01 k8s-master kernel: [11616.964162] sh (22759): drop_caches: 3
$ kubectl get events
-->

## Network:
<!-- If your problem has anything to do with one network endpoint not being able to contact another, please run the following commands -->

$ ip route $ ip -4 -o addr $ sudo iptables-save

fuzhibo avatar May 16 '22 03:05 fuzhibo

Up for this Issue.

busyboy77 avatar May 24 '22 18:05 busyboy77

For more details, see:

  • https://github.com/weaveworks/weave/issues/3936
  • https://github.com/containerd/containerd/issues/6921

I'm not sure if it's weave's fault or containerd's fault; if it's weave's I guess they'll want to fix it; if it's containerd's they'll want to advocate and explain why 😅

jpetazzo avatar May 25 '22 06:05 jpetazzo

Can confirm as well on kubernetes v1.24.1/weave 2.8.1 where downgrading to containerd.io=1.5.11-1 solved the issue too.

boskoop avatar May 30 '22 20:05 boskoop

and I am sure until they find who's fault was it, people will forget weave altogether on Kubernetes, my guess after seeing that weave was last updated about an year ago

busyboy77 avatar May 31 '22 13:05 busyboy77

If there is someone who has the chops and wants to see Weave net maintained, send your PRs @ me and I will help you to try and get them merged.

My understanding from https://github.com/containerd/containerd/issues/6921#issuecomment-1146680225 is that this all works again, thanks to a change from upstream which has resolved the backwards-incompatible changes in CNI.

Which means of course, people can install weave net again (and they might be in danger with no maintainers actively pushing out releases.)

(Edit: the discussion in https://github.com/weaveworks/weave/pull/3939 is a good place to start if you haven't seen it yet.)

kingdonb avatar Jun 06 '22 12:06 kingdonb

Meanwhile, I can confirm that weave net works as-is on containerd 1.6.6, with Kubernetes 1.24, 1.23 and 1.22.

rajch avatar Jun 07 '22 02:06 rajch