weave Cause of "Captured frame from MAC ... associated with another peer"

Creating this based on https://github.com/weaveworks/weave/issues/2877#issuecomment-436918038

What you expected to happen?

When a new node joins the cluster, any existing node should not become unroutable.

What happened?

Today we got one more unroutable alert for one of the kubernetes node (10.2.20.238). We saw that node became unroutable just after a new node 10.2.20.227 joined the cluster.

When i say healthy or routable i mean curl node_ip:node_port/endpoint has started working

Events I0227 05:39:49 > 10.2.20.227 node add event

I0227 05:40:23 > 10.2.20.238 node unhealthy event 👎 (continuously unhealthy till 05:51:54) I0227 05:50:25 > 10.2.20.227 became healthy for the first time(routable) 👍 I0227 05:51:54 > 10.2.20.238 got healthy 👍 I0227 06:01:15 > 10.2.20.227 node delete event

How to reproduce it?

Not sure. May be if the node with this ip joins again in the current network, this can be reproduced again. I am keeping an eye. I will update it here when i see a pattern or able to reproduce it.

Anything else we need to know?

kops1.15.0 made cluster.

Versions:

$ weave version
2.6.0
$ docker version
18.06.3-ce
$ uname -a
Linux ip-10-2-21-229 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

MTU Setting

admin@ip-10-2-20-238:~$ sudo ifconfig| grep -i MTU | grep -v veth
datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8912

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
vxlan-6784: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65485
weave: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8912

Logs:

Weave logs of 10.2.20.238 which got unhealthy https://gist.github.com/alok87/5b99d5b07b01306c5f1f34c3eb0f1025

If you check the weave log of 10.2.20.238 ^ The weave log is filled up with the Captured frame from MAC issues after 10.2.20.227 joined the cluster and this 10.2.20.238 was continuosly unhealthy after that.

cat weave-net-4zggw-238.log | grep "2020/02/27 05:4" | grep  "Captured frame from MAC" | grep 227 | wc -l
541

cat weave-net-4zggw-238.log | grep "2020/02/27 05:2" | grep  "Captured frame from MAC" | wc -l
0

You can see there were like 541 errors only for 227 node.

Feb 27 '20 09:02 alok87

Hi @alok87 , We are facing a similar issue. Were you able to find anything around it ?

Dec 14 '20 03:12 xtroncode

No

Dec 14 '20 05:12 alok87

Having similar issues using this image weaveworks/weave-kube:2.8.1 and k8s 1.19.9

Jun 08 '21 19:06 nicolasdonoso

Same issues, weave 2.8.1, k8s 1.20.9

Aug 25 '21 23:08 pulberg

Also seeing the same error message with Weave 2.8.1, we are not seeing this behavior in our clusters still running Weave 2.7.0

EDIT: I believe our problems were mostly because we upgraded to Weave 2.8 without using the new DaemonSet that was introduced. So we were using the DaemonSet for v2.7 with the 2.8 image of Weave.

Sep 28 '21 07:09 nyxi