weave
weave copied to clipboard
Martian sources in kernel logs
What you expected to happen?
Not having the kernel logs filled with martian source messages
Jun 23 17:24:09 ip-10-207-56-143 kernel: IPv4: martian source 100.120.0.4 from 100.96.0.14, on dev datapath
Jun 23 17:24:09 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff 96 24 4d ed 2f 82 08 06 .......$M./...
What happened?
Kernel logs have martian source messages
Jun 23 17:24:09 ip-10-207-56-143 kernel: IPv4: martian source 100.120.0.4 from 100.96.0.14, on dev datapath
Jun 23 17:24:09 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff 96 24 4d ed 2f 82 08 06 .......$M./...
Jun 23 17:24:09 ip-10-207-56-143 kernel: IPv4: martian source 100.88.0.6 from 100.88.0.4, on dev datapath
Jun 23 17:24:09 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff 32 d8 8a 82 27 f4 08 06 ......2...'...
Jun 23 17:24:12 ip-10-207-56-143 kernel: IPv4: martian source 100.120.0.4 from 100.96.0.1, on dev datapath
Jun 23 17:24:12 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff 56 f2 78 41 79 70 08 06 ......V.xAyp..
Jun 23 17:24:39 ip-10-207-56-143 kernel: IPv4: martian source 100.88.0.6 from 100.96.0.5, on dev datapath
Jun 23 17:24:39 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff be 07 b1 8d de 70 08 06 ...........p..
Jun 23 17:24:42 ip-10-207-56-143 kernel: IPv4: martian source 100.64.0.6 from 100.64.0.4, on dev datapath
Jun 23 17:24:42 ip-10-207-56-143 kernel: ll header: 00000000: ff ff ff ff ff ff 66 87 3f 9e 9b 16 08 06 ......f.?.....
ip addr output
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0e:c8:74:19:ed:53 brd ff:ff:ff:ff:ff:ff
inet 10.207.56.143/26 brd 10.207.56.191 scope global dynamic eth0
valid_lft 2933sec preferred_lft 2933sec
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:8c:cd:a8:4f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 26:f6:dc:fc:6c:1f brd ff:ff:ff:ff:ff:ff
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue state UP group default qlen 1000
link/ether ca:b7:8d:a2:b9:49 brd ff:ff:ff:ff:ff:ff
inet 100.120.0.0/10 brd 100.127.255.255 scope global weave
valid_lft forever preferred_lft forever
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a6:17:fe:3a:1d:bc brd ff:ff:ff:ff:ff:ff
15: vethwepl504d9ed@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue master weave state UP group default
link/ether 5e:77:6e:d4:5d:42 brd ff:ff:ff:ff:ff:ff link-netnsid 1
60: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue master datapath state UP group default
link/ether 76:d0:3c:f1:3c:ab brd ff:ff:ff:ff:ff:ff
61: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue master weave state UP group default
link/ether 82:ea:ee:a4:6a:ed brd ff:ff:ff:ff:ff:ff
94: vethweple9c0385@if93: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue master weave state UP group default
link/ether a6:4d:d7:9e:b3:da brd ff:ff:ff:ff:ff:ff link-netnsid 0
122: vethwepl53ae8e9@if121: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8916 qdisc noqueue master weave state UP group default
link/ether 6a:ad:68:2b:af:fa brd ff:ff:ff:ff:ff:ff link-netnsid 3
125: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
link/ether de:bf:10:de:33:6e brd ff:ff:ff:ff:ff:ff
sysctl output
# sysctl --system
* Applying /etc/sysctl.d/00-defaults.conf ...
kernel.printk = 8 4 1 7
kernel.panic = 30
net.ipv4.neigh.default.gc_thresh1 = 0
net.ipv4.neigh.default.gc_thresh2 = 15360
net.ipv4.neigh.default.gc_thresh3 = 16384
* Applying /usr/lib/sysctl.d/00-system.conf ...
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-amazon.conf ...
kernel.sched_autogroup_enabled = 0
* Applying /etc/sysctl.d/99-sysctl.conf ...
fs.suid_dumpable = 0
kernel.randomize_va_space = 2
net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.route.flush = 1
vm.max_map_count = 262144
* Applying /etc/sysctl.conf ...
fs.suid_dumpable = 0
kernel.randomize_va_space = 2
net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.route.flush = 1
vm.max_map_count = 262144
How to reproduce it?
Anything else we need to know?
EKS
Versions:
$ weave version
2.6.5
$ docker version
19.03.6-ce
$ uname -a
4.14.173-137.229.amzn2.x86_64
$ kubectl version
1.16.8
Logs:
$ docker logs weave
or, if using Kubernetes:
$ kubectl logs -n kube-system <weave-net-pod> weave
$ kubectl logs -n kube-system weave-net-2zpjs weave
DEBU: 2020/06/23 17:16:52.803441 [kube-peers] Checking peer "96:bb:40:05:dd:db" against list &{[{96:bb:40:05:dd:db ip-10-207-56-229.ec2.internal} {ca:b7:8d:a2:b9:49 ip-10-207-56-143.ec2.internal} {86:8d:08:e5:0d:a6 ip-10-207-56-225.ec2.internal} {62:67:8f:44:45:ec ip-10-207-56-183.ec2.internal}]}
INFO: 2020/06/23 17:16:53.861097 [kube-peers] Added myself to peer list &{[{96:bb:40:05:dd:db ip-10-207-56-229.ec2.internal} {ca:b7:8d:a2:b9:49 ip-10-207-56-143.ec2.internal} {86:8d:08:e5:0d:a6 ip-10-207-56-225.ec2.internal} {62:67:8f:44:45:ec ip-10-207-56-183.ec2.internal}]}
DEBU: 2020/06/23 17:16:53.871917 [kube-peers] Nodes that have disappeared: map[]
100.96.0.0
DEBU: 2020/06/23 17:16:53.950276 registering for updates for node delete events
WARN: 2020/06/23 17:16:59.729336 Vetoed installation of hairpin flow FlowSpec{keys: [TunnelFlowKey{id: 0000000000651b8b, ipv4src: 10.207.56.225, ipv4dst: 10.207.56.229} InPortFlowKey{vport: 2}], actions: [SetTunnelAction{id: 0000000000651b8b, ipv4src: 10.207.56.229, ipv4dst: 10.207.56.225, tos: 0, ttl: 64, df: true, csum: false} OutputAction{vport: 2}]}
ERRO: 2020/06/23 17:17:13.717423 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:17:39.695821 Captured frame from MAC (66:87:3f:9e:9b:16) to (ae:b6:35:f1:9c:10) associated with another peer 86:8d:08:e5:0d:a6(ip-10-207-56-225.ec2.internal)
ERRO: 2020/06/23 17:17:42.676143 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:17:47.696985 Captured frame from MAC (66:87:3f:9e:9b:16) to (da:f0:7e:32:7a:4b) associated with another peer 86:8d:08:e5:0d:a6(ip-10-207-56-225.ec2.internal)
ERRO: 2020/06/23 17:21:53.124246 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:26:53.124360 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:31:53.124488 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:36:53.124523 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
ERRO: 2020/06/23 17:41:53.124511 Captured frame from MAC (32:d8:8a:82:27:f4) to (ee:c8:05:7e:24:e0) associated with another peer 62:67:8f:44:45:ec(ip-10-207-56-183.ec2.internal)
Network:
$ ip route
$ ip -4 -o addr
$ sudo iptables-save
Related: #3327
I found similar log entries (but for calico), googled for a solution, and found this issue.
I could resolve the problem by changing the network ranges. We were using the 192.168.x range. In this case here the range 100.120.0.0/10 is used. Both ranges are special IP ranges...
https://en.wikipedia.org/wiki/Reserved_IP_addresses
They are "special" in the sense that they don't route on the public Internet, which makes them good for use inside Weave Net.
Sure, I didn't dig deeper, because I was happy that it finally worked. I have installed Cisco Container Platform 8.0.0 on a nested vSphere environment and just tried it with this trick. All of the sudden I had no issues with martian packages anymore and stopped my research of the root cause.
If you have a better explanation for this behavior, I would be grateful to learn more!
Could you say what exactly worked? You said you changed the range, and what it was before, but not what you changed it to.
We don't have any explanation, which is why this issue and #3327 are open.
before we had the following setting:
pod network: 192.169.0.0/16 cidr node network: "192.168.200.0/24" gateway_ip: "192.168.200.254"
now we have:
pod network: 192.168.0.0/16 node network: "10.98.0.0/20" gateway_ip: "10.98.0.1"
but I have to double check the first pod network CIDR....
As you can see, we also use the private IP range as container network. I am not sure if it happens because both ranges are private.... maybe this routing suppression is also active on OS level...
This are just some thoughts on top of my head. Maybe you can easily invalidate this explanation by switching the network ranges for your tests as I did it.
Any updates to this issue?
We see intermittently similar entries in the EC2 instance's syslog and networking doesn't work correctly within the pods (usually notice because name resolution doesn't work).
When this happens for us on EKS, we end up having to rotate the node (sometimes go through several new instances) before traffic works in pods. A new EC2 instance seems to be the only way to get a fully functioning node.