weave
weave copied to clipboard
Investigate martian packets
From time to time, in kernel logs we see:
[ 2717.970445] IPv4: martian source 10.32.0.5 from 10.32.0.2, on dev datapath
[ 2717.970446] ll header: 00000000: ff ff ff ff ff ff 06 69 66 10 db 3f 08 06
Some execution paths in the kernel (e.g. https://github.com/torvalds/linux/blob/v4.17/net/ipv4/route.c#L1699) suggest that a martian packet can be dropped after the kernel has logged about it.
Investigate:
- Why kernel logs about martian packets only sometimes.
- What is impact of martian packets on Weave network stability.
I am seeing this issue as well
@taemon1337 Do you observe any packet loss?
We see lots of martian packets from the kernel and retransmits with iperf. We are running baremetal on Centos7. Our avg throughput using iperf on each node-pair is < 1Gbps, where localhost is 10 Gbps
Could you paste martian packet logs from dmesg
?
I cant actually post them directly, but here is the text line: IPv4: martian source 10.44.0.2 from 10.40.0.0, on dev datapath
One of our engineers decoded them and said they were ARP packets
We are seeing the same on our new setup with a mix of bare metal and VM nodes
Aug 7 18:10:15 node655 kernel: IPv4: martian source 172.16.64.2 from 172.16.248.0, on dev datapath
Aug 7 18:10:15 node655 kernel: ll header: 00000000: ff ff ff ff ff ff be 9f b5 37 a7 72 08 06 .........7.r..
Aug 7 18:10:16 node655 kernel: IPv4: martian source 172.16.64.2 from 172.16.248.0, on dev datapath
Can it be due to two nic's on bare-metal?
@paphillon I don't see how two NIC's could cause it. What is your Weave Net IP addr range? 172.16.0.0/16?
@brb Yes, that is Weave Net IP addr range. What we have noticed is that the martian source warnings flood specifically if one or more node is unhealthy due to network, cpu or infra related issues. But then due to this flood, it seems to slow down other nodes as well possibly due to network traffic.
We did hit this issue again today morning and did cause an outage. Around 2:45 am PT there was a network outage for 4 mins, services recovered quickly as the delay in DNS lookup was intermittent, further debugging we saw below errors being logged almost continuously in the OS logs.
kernel: IPv4: martian source 172.16.16.22 from 172.16.72.0, on dev datapath kernel: ll header: 00000000: ff ff ff ff ff ff 22 83 2d 0c 5a dc 08 06 ......".-.Z...
The source is coreDNS and to resolve the problem we have to restart that particular coreDNS pod. Not sure how it is linked to weave net, but interestingly we ONLY see this issue in our non-prod and prod env, while in dev we have never seen this issue.
The only difference in dev is that all VM's have one NIC and uses 172.200.0.0/24 ip range for weave net, while non-prod, prod envs we have mix of vm and bare metal with 2 NICs and uses 172.16.0.0/16 ip range for weave.
@brb / @bboreham - Any leads on resolving this issue? These are ARP packets and did not get any lead by searching coreDNS related issues.
Not a lead on resolution. But couple of thoughts.
It it interesting to know, it's same pattern on all the three reported cases i.e. packet is received on datapath
device and its ARP traffic.
Since the packets are not non-routable IP's its the case of receiving an invalid IP on the interface. For e.g in case of
kernel: IPv4: martian source 172.16.16.22 from 172.16.72.0, on dev datapath
networking stack is not expecting packets with IP 172.16.72.0
on datapath
device.
Or its case of reverse traffic going through on different interface than on which packet arrived.
If you observe martian packets again, can you please make a note of the nodes on which source and destination pods are running, routes on the nodes and weave report
of both the nodes?
@murali-reddy - I do have some logs captured from that event except the weave report if that may help.
Quick question, we are planning to have coreDNS listen on hostnetwork instead of overlay IP as the offending source address has always been associated with coreDNS pod ip. Do you think it may help?
Our Dev cluster has cluster cidr 172.200.0.0/24 while stage and prod are 172.16.0.0/16. We have never encountered this issue in dev, while everything else remains the same. Do you think this can be a contributing factor?
We don't have a test case yet to reproduce this issue and by itself, it's a rare event. Unfortunately, that means waiting until it happens again and cause an outage so i am trying to stay ahead of it if possible.
172.16.16.22 & 172.16.184.4 are coredns pod ips
Mar 19 20:45:22 or1dra658 kernel: IPv4: martian source 172.16.16.22 from 172.16.72.0, on dev datapath
Mar 19 20:45:22 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff 22 83 2d 0c 5a dc 08 06 ......".-.Z...
Mar 19 20:45:22 or1dra658 kernel: IPv4: martian source 172.16.184.4 from 172.16.216.0, on dev datapath
Mar 19 20:45:22 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff ea a4 90 36 8c e4 08 06 .........6....
Mar 19 20:45:23 or1dra658 kernel: IPv4: martian source 172.16.16.22 from 172.16.72.0, on dev datapath
Mar 19 20:45:23 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff 22 83 2d 0c 5a dc 08 06 ......".-.Z...
Mar 19 20:45:23 or1dra658 kernel: IPv4: martian source 172.16.184.4 from 172.16.216.0, on dev datapath
Mar 19 20:45:23 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff ea a4 90 36 8c e4 08 06 .........6....
Mar 19 20:45:26 or1dra658 kernel: IPv4: martian source 172.16.16.22 from 172.16.72.0, on dev datapath
Mar 19 20:45:26 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff 22 83 2d 0c 5a dc 08 06 ......".-.Z...
Mar 19 20:45:26 or1dra658 kernel: IPv4: martian source 172.16.184.4 from 172.16.216.0, on dev datapath
Mar 19 20:45:26 or1dra658 kernel: ll header: 00000000: ff ff ff ff ff ff ea a4 90 36 8c e4 08 06 .........6....
Weave net logs from the host where martian errors were seen in the logs
DEBU: 2019/03/19 18:51:15.150194 [kube-peers] Checking peer "ea:a4:90:36:8c:e4" against list &{[{7e:5f:8a:45:61:51 xy1010050035011.corp.xy.com} {96:3b:8e:25:64:6a xy1010050035012.corp.xy.com} {46:c1:59:91:d0:ea xy1010050035007.corp.xy.com} {e6:18:fc:c6:c0:c0 xy1010050035010.corp.xy.com} {12:f2:33:54:1b:a6 xy1010050035009.corp.xy.com} {66:1a:53:fc:8e:ea xy1010050035014.corp.xy.com} {be:9f:b5:37:a7:72 xy1dra655.corp.xy.com} {92:8b:90:4e:79:09 xy1dra656.corp.xy.com} {b2:b8:10:72:3f:74 xy1dra657.corp.xy.com} {ea:a4:90:36:8c:e4 xy1dra658.corp.xy.com} {2e:3e:0c:1a:b3:61 xy1010050034200.corp.xy.com} {22:83:2d:0c:5a:dc xy1010050034204.corp.xy.com} {aa:de:18:e8:93:8f xy1010050034205.corp.xy.com} {a6:9a:26:c8:46:c6 xy1010050035008.corp.xy.com} {12:30:75:29:cd:92 xy1010050035019.corp.xy.com}]}
INFO: 2019/03/19 18:51:18.731217 Command line options: map[no-dns:true port:6783 host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=15 metrics-addr:0.0.0.0:6782 db-prefix:/weavedb/weave-net name:ea:a4:90:36:8c:e4 conn-limit:40 expect-npc:true mtu:1337 datapath:datapath docker-api: ipalloc-range:172.16.0.0/16 nickname:xy1dra658.corp.xy.com]
INFO: 2019/03/19 18:51:18.731344 weave 2.5.1
INFO: 2019/03/19 18:51:20.125085 Re-exposing 172.16.216.0/16 on bridge "weave"
INFO: 2019/03/19 18:51:20.325415 Bridge type is bridged_fastdp
INFO: 2019/03/19 18:51:20.325451 Communication between peers is unencrypted.
INFO: 2019/03/19 18:51:20.537983 Our name is ea:a4:90:36:8c:e4(xy1dra658.corp.xy.com)
INFO: 2019/03/19 18:51:20.538062 Launch detected - using supplied peer list: [10.50.34.200 10.50.34.204 10.50.34.205 10.50.35.7 10.50.35.8 10.50.35.9 10.50.35.10 10.50.35.11 10.50.35.12 10.50.35.14 10.50.35.19 10.50.34.196 10.50.34.197 10.50.34.198 10.50.34.199]
INFO: 2019/03/19 18:51:20.628649 Checking for pre-existing addresses on weave bridge
INFO: 2019/03/19 18:51:20.629132 weave bridge has address 172.16.216.0/16
INFO: 2019/03/19 18:51:22.024348 Found address 172.16.88.8/16 for ID _
INFO: 2019/03/19 18:51:22.025330 Found address 172.16.88.8/16 for ID _
INFO: 2019/03/19 18:51:22.026233 Found address 172.16.88.12/16 for ID _
INFO: 2019/03/19 18:51:22.026686 Found address 172.16.88.12/16 for ID _
INFO: 2019/03/19 18:51:22.028593 Found address 172.16.88.12/16 for ID _
INFO: 2019/03/19 18:51:22.127137 Found address 172.16.88.10/16 for ID _
INFO: 2019/03/19 18:51:22.127606 Found address 172.16.88.10/16 for ID _
INFO: 2019/03/19 18:51:22.128069 Found address 172.16.88.10/16 for ID _
INFO: 2019/03/19 18:51:22.129259 [allocator ea:a4:90:36:8c:e4] Initialising with persisted data
INFO: 2019/03/19 18:51:22.129470 Sniffing traffic on datapath (via ODP)
INFO: 2019/03/19 18:51:22.130327 ->[10.50.34.199:6783] attempting connection
INFO: 2019/03/19 18:51:22.130389 ->[10.50.34.197:6783] attempting connection
INFO: 2019/03/19 18:51:22.130513 ->[10.50.34.205:6783] attempting connection
INFO: 2019/03/19 18:51:22.130693 ->[10.50.35.14:6783] attempting connection
INFO: 2019/03/19 18:51:22.130876 ->[10.50.35.19:6783] attempting connection
INFO: 2019/03/19 18:51:22.130978 ->[10.50.34.200:6783] attempting connection
INFO: 2019/03/19 18:51:22.131138 ->[10.50.35.9:6783] attempting connection
INFO: 2019/03/19 18:51:22.131277 ->[10.50.35.10:6783] attempting connection
INFO: 2019/03/19 18:51:22.131421 ->[10.50.35.8:6783] attempting connection
INFO: 2019/03/19 18:51:22.131548 ->[10.50.35.7:6783] attempting connection
INFO: 2019/03/19 18:51:22.131649 ->[10.50.34.197:6783|92:8b:90:4e:79:09(xy1dra656.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.131754 ->[10.50.34.204:6783] attempting connection
INFO: 2019/03/19 18:51:22.131833 overlay_switch ->[92:8b:90:4e:79:09(xy1dra656.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.131899 ->[10.50.35.12:6783] attempting connection
INFO: 2019/03/19 18:51:22.131938 ->[10.50.34.205:6783|aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.132057 overlay_switch ->[aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.132075 ->[10.50.34.198:6783] attempting connection
INFO: 2019/03/19 18:51:22.132233 ->[10.50.35.11:6783] attempting connection
INFO: 2019/03/19 18:51:22.132354 ->[10.50.34.196:6783] attempting connection
INFO: 2019/03/19 18:51:22.132521 ->[10.50.34.199:54990] connection accepted
INFO: 2019/03/19 18:51:22.132712 ->[10.50.34.197:6783|92:8b:90:4e:79:09(xy1dra656.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:22.133359 ->[10.50.34.205:6783|aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:22.223552 ->[10.50.34.200:6783|2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.224675 overlay_switch ->[2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.224729 ->[10.50.34.200:6783|2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:22.623817 ->[10.50.35.10:6783|e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.724729 overlay_switch ->[e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.724900 ->[10.50.35.10:6783|e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:22.825018 ->[10.50.35.19:6783|12:30:75:29:cd:92(xy1010050035019.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.843799 ->[10.50.34.198:6783|b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.923550 ->[10.50.35.8:6783|a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.923660 overlay_switch ->[b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.923721 overlay_switch ->[12:30:75:29:cd:92(xy1010050035019.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.923875 ->[10.50.34.198:6783|b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:22.923990 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2019/03/19 18:51:22.924229 ->[10.50.34.199:6783|ea:a4:90:36:8c:e4(xy1dra658.corp.xy.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2019/03/19 18:51:22.924655 ->[10.50.35.12:6783|96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:22.924849 overlay_switch ->[96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:22.924960 ->[10.50.35.19:6783|12:30:75:29:cd:92(xy1010050035019.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.023412 ->[10.50.35.12:6783|96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.023497 overlay_switch ->[a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.023781 ->[10.50.35.14:6783|66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.024486 ->[10.50.34.196:6783|be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.024668 ->[10.50.34.199:54990|ea:a4:90:36:8c:e4(xy1dra658.corp.xy.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2019/03/19 18:51:23.024857 ->[10.50.35.8:6783|a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.123244 Listening for metrics requests on 0.0.0.0:6782
INFO: 2019/03/19 18:51:23.123387 ->[10.50.35.7:6783|46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.124046 overlay_switch ->[66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.223388 overlay_switch ->[be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.223541 overlay_switch ->[46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.224009 ->[10.50.35.11:6783|7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.224189 overlay_switch ->[7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.224361 ->[10.50.35.14:6783|66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.224645 ->[10.50.34.204:6783|22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.224786 overlay_switch ->[22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.323225 ->[10.50.35.9:6783|12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)]: connection ready; using protocol version 2
INFO: 2019/03/19 18:51:23.323312 overlay_switch ->[12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.323388 ->[10.50.34.196:6783|be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.323601 ->[10.50.35.7:6783|46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.323803 ->[10.50.35.11:6783|7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.324028 ->[10.50.34.204:6783|22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.324263 ->[10.50.35.9:6783|12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)]: connection added (new peer)
INFO: 2019/03/19 18:51:23.626959 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:23.627063 overlay_switch ->[92:8b:90:4e:79:09(xy1dra656.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:23.627142 ->[10.50.34.197:6783|92:8b:90:4e:79:09(xy1dra656.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.637323 overlay_switch ->[92:8b:90:4e:79:09(xy1dra656.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.723344 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=3.10.0-957.1.3.el7.x86_64&flag_kubernetes-cluster-size=15&flag_kubernetes-cluster-uid=1ab5e380-7590-11e8-a373-0050568482ec&flag_kubernetes-version=v1.12.3&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&flag_network=fastdp&os=linux&signature=oshZeAJcD3hsHmfEuqAQ3CbnZbUniZGyv9wSo5g8U%2BE%3D&version=2.5.1: read tcp 10.50.34.199:56210->216.58.195.83:443: read: connection reset by peer
INFO: 2019/03/19 18:51:23.725645 ->[10.50.34.205:6783|aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.725964 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:23.726282 sleeve ->[10.50.34.205:6783|aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:23.727902 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:23.728003 overlay_switch ->[2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:23.728035 ->[10.50.34.200:6783|2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.823447 overlay_switch ->[2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:23.824210 overlay_switch ->[e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:23.824418 ->[10.50.35.10:6783|e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.824508 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:23.825641 ->[10.50.35.19:6783|12:30:75:29:cd:92(xy1010050035019.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.826208 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:23.923599 ->[10.50.35.12:6783|96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:23.923974 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.024527 sleeve ->[10.50.34.197:6783|92:8b:90:4e:79:09(xy1dra656.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.027239 ->[10.50.35.8:6783|a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.027604 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.028360 overlay_switch ->[b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:24.029067 sleeve ->[10.50.34.200:6783|2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.123604 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.123613 ->[10.50.34.198:6783|b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.123996 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.124208 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.124916 sleeve ->[10.50.35.12:6783|96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.125764 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.125868 overlay_switch ->[22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:24.134335 overlay_switch ->[22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:24.134577 ->[10.50.35.9:6783|12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.134907 sleeve ->[10.50.35.8:6783|a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.223266 sleeve ->[10.50.35.10:6783|e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.223287 overlay_switch ->[e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:24.223648 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.223940 ->[10.50.34.196:6783|be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.224010 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.224134 overlay_switch ->[7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)] using sleeve
INFO: 2019/03/19 18:51:24.224192 overlay_switch ->[7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:24.225167 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/03/19 18:51:24.225791 overlay_switch ->[b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)] using fastdp
INFO: 2019/03/19 18:51:24.226528 sleeve ->[10.50.34.196:6783|be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.226905 sleeve ->[10.50.35.7:6783|46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.227270 sleeve ->[10.50.34.204:6783|22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.227564 sleeve ->[10.50.35.14:6783|66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.227851 sleeve ->[10.50.35.19:6783|12:30:75:29:cd:92(xy1010050035019.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.228260 sleeve ->[10.50.35.11:6783|7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.228554 sleeve ->[10.50.35.9:6783|12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.323339 sleeve ->[10.50.34.198:6783|b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)]: Effective MTU verified at 1438
INFO: 2019/03/19 18:51:24.323346 ->[10.50.35.7:6783|46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.323782 ->[10.50.34.204:6783|22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.423323 ->[10.50.35.14:6783|66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.423750 ->[10.50.35.11:6783|7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)]: connection fully established
INFO: 2019/03/19 18:51:24.611362 Discovered remote MAC aa:de:18:e8:93:8f at aa:de:18:e8:93:8f(xy1010050034205.corp.xy.com)
INFO: 2019/03/19 18:51:24.823393 [kube-peers] Added myself to peer list &{[{7e:5f:8a:45:61:51 xy1010050035011.corp.xy.com} {96:3b:8e:25:64:6a xy1010050035012.corp.xy.com} {46:c1:59:91:d0:ea xy1010050035007.corp.xy.com} {e6:18:fc:c6:c0:c0 xy1010050035010.corp.xy.com} {12:f2:33:54:1b:a6 xy1010050035009.corp.xy.com} {66:1a:53:fc:8e:ea xy1010050035014.corp.xy.com} {be:9f:b5:37:a7:72 xy1dra655.corp.xy.com} {92:8b:90:4e:79:09 xy1dra656.corp.xy.com} {b2:b8:10:72:3f:74 xy1dra657.corp.xy.com} {ea:a4:90:36:8c:e4 xy1dra658.corp.xy.com} {2e:3e:0c:1a:b3:61 xy1010050034200.corp.xy.com} {22:83:2d:0c:5a:dc xy1010050034204.corp.xy.com} {aa:de:18:e8:93:8f xy1010050034205.corp.xy.com} {a6:9a:26:c8:46:c6 xy1010050035008.corp.xy.com} {12:30:75:29:cd:92 xy1010050035019.corp.xy.com}]}
DEBU: 2019/03/19 18:51:24.929614 [kube-peers] Nodes that have disappeared: map[]
172.16.216.0
10.50.34.200
10.50.34.204
10.50.34.205
10.50.35.7
10.50.35.8
10.50.35.9
10.50.35.10
10.50.35.11
10.50.35.12
10.50.35.14
10.50.35.19
10.50.34.196
10.50.34.197
10.50.34.198
10.50.34.199
DEBU: 2019/03/19 18:51:26.131401 registering for updates for node delete events
INFO: 2019/03/19 18:51:31.844403 Discovered remote MAC 22:83:2d:0c:5a:dc at 22:83:2d:0c:5a:dc(xy1010050034204.corp.xy.com)
INFO: 2019/03/19 18:51:31.908327 Discovered remote MAC ba:99:46:b1:ed:f1 at be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)
INFO: 2019/03/19 18:51:40.832128 Discovered remote MAC d2:4e:52:b8:eb:fc at e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
INFO: 2019/03/19 18:56:50.980959 Discovered remote MAC 0a:16:28:80:b5:3d at 46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)
INFO: 2019/03/19 19:03:28.694054 Discovered remote MAC 12:30:75:29:cd:92 at 12:30:75:29:cd:92(xy1010050035019.corp.xy.com)
INFO: 2019/03/19 19:05:13.250396 Discovered remote MAC 86:d8:3f:2f:31:71 at a6:9a:26:c8:46:c6(xy1010050035008.corp.xy.com)
INFO: 2019/03/19 19:05:15.092638 Discovered remote MAC be:9f:b5:37:a7:72 at be:9f:b5:37:a7:72(xy1dra655.corp.xy.com)
INFO: 2019/03/19 19:05:15.605427 Discovered remote MAC 92:8b:90:4e:79:09 at 92:8b:90:4e:79:09(xy1dra656.corp.xy.com)
INFO: 2019/03/19 19:05:17.258019 Discovered remote MAC 2e:3e:0c:1a:b3:61 at 2e:3e:0c:1a:b3:61(xy1010050034200.corp.xy.com)
INFO: 2019/03/19 19:05:17.438399 Discovered remote MAC b2:b8:10:72:3f:74 at b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)
INFO: 2019/03/19 19:06:59.112015 Discovered remote MAC 7e:5f:8a:45:61:51 at 7e:5f:8a:45:61:51(xy1010050035011.corp.xy.com)
INFO: 2019/03/19 19:08:15.290517 Discovered remote MAC 66:1a:53:fc:8e:ea at 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
INFO: 2019/03/19 19:08:18.693464 Discovered remote MAC 96:3b:8e:25:64:6a at 96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)
INFO: 2019/03/19 19:08:33.505661 Discovered remote MAC 12:f2:33:54:1b:a6 at 12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)
INFO: 2019/03/19 19:09:52.660987 Discovered remote MAC 46:c1:59:91:d0:ea at 46:c1:59:91:d0:ea(xy1010050035007.corp.xy.com)
INFO: 2019/03/19 19:10:13.305601 Discovered remote MAC e6:18:fc:c6:c0:c0 at e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
ERRO: 2019/03/19 19:14:17.538342 Captured frame from MAC (96:3b:8e:25:64:6a) to (d2:4e:52:b8:eb:fc) associated with another peer 96:3b:8e:25:64:6a(xy1010050035012.corp.xy.com)
INFO: 2019/03/19 19:20:13.172074 Discovered remote MAC 66:1a:53:fc:8e:ea at 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
ERRO: 2019/03/19 19:20:13.172186 Captured frame from MAC (66:1a:53:fc:8e:ea) to (86:d8:3f:2f:31:71) associated with another peer 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
ERRO: 2019/03/19 19:21:20.539318 Captured frame from MAC (66:1a:53:fc:8e:ea) to (86:d8:3f:2f:31:71) associated with another peer 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
ERRO: 2019/03/19 19:26:20.539412 Captured frame from MAC (66:1a:53:fc:8e:ea) to (86:d8:3f:2f:31:71) associated with another peer 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
ERRO: 2019/03/19 19:32:10.908964 Captured frame from MAC (66:1a:53:fc:8e:ea) to (86:d8:3f:2f:31:71) associated with another peer 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
ERRO: 2019/03/19 19:36:20.537993 Captured frame from MAC (66:1a:53:fc:8e:ea) to (86:d8:3f:2f:31:71) associated with another peer 66:1a:53:fc:8e:ea(xy1010050035014.corp.xy.com)
INFO: 2019/03/19 20:03:15.928208 Discovered remote MAC 12:f2:33:54:1b:a6 at 12:f2:33:54:1b:a6(xy1010050035009.corp.xy.com)
INFO: 2019/03/19 20:10:02.127659 Discovered remote MAC e6:18:fc:c6:c0:c0 at e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
ERRO: 2019/03/19 20:10:02.127746 Captured frame from MAC (e6:18:fc:c6:c0:c0) to (86:d8:3f:2f:31:71) associated with another peer e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
ERRO: 2019/03/19 20:11:20.539166 Captured frame from MAC (e6:18:fc:c6:c0:c0) to (86:d8:3f:2f:31:71) associated with another peer e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
ERRO: 2019/03/19 20:16:20.539213 Captured frame from MAC (e6:18:fc:c6:c0:c0) to (86:d8:3f:2f:31:71) associated with another peer e6:18:fc:c6:c0:c0(xy1010050035010.corp.xy.com)
INFO: 2019/03/19 20:20:42.215391 Discovered remote MAC 12:30:75:29:cd:92 at 12:30:75:29:cd:92(xy1010050035019.corp.xy.com)
INFO: 2019/03/19 20:21:36.460277 Discovered remote MAC b2:b8:10:72:3f:74 at b2:b8:10:72:3f:74(xy1dra657.corp.xy.com)
Weave report - Note this was taken today, one difference noted as compared to dev env is the ip range. In dev the allocated ip range there is only one instance where the ip range ends with .0, while in stage and prod as shown below has several such instances, not sure if this is expected due to the cidr range.
IPRange|Size|Host
172.16.0.0|2|xy1010050035014.corp.xy.com
172.16.0.2|1|xy1010050035019.corp.xy.com
172.16.0.3|4093|xy1010050035014.corp.xy.com
172.16.16.0|4096|xy1010050035010.corp.xy.com
172.16.32.0|1|xy1010050035009.corp.xy.com
172.16.32.1|2047|xy1010050035014.corp.xy.com
172.16.40.0|2048|xy1010050034200.corp.xy.com
172.16.48.0|4096|xy1010050035009.corp.xy.com
172.16.64.0|1|xy1010050035014.corp.xy.com
172.16.64.1|2047|xy1010050035007.corp.xy.com
172.16.72.0|2048|xy1010050034204.corp.xy.com
172.16.80.0|1|xy1010050034200.corp.xy.com
172.16.80.1|2047|xy1010050035014.corp.xy.com
172.16.88.0|2048|xy1dra658.corp.xy.com
172.16.96.0|2048|xy1010050035014.corp.xy.com
172.16.104.0|2048|xy1dra656.corp.xy.com
172.16.112.0|1|xy1010050035014.corp.xy.com
172.16.112.1|2047|xy1010050035007.corp.xy.com
172.16.120.0|1|xy1dra656.corp.xy.com
172.16.120.1|2047|xy1010050035014.corp.xy.com
172.16.128.0|1|xy1010050035011.corp.xy.com
172.16.128.1|2047|xy1010050035014.corp.xy.com
172.16.136.0|2048|xy1010050035008.corp.xy.com
172.16.144.0|4096|xy1dra655.corp.xy.com
172.16.160.0|4096|xy1dra657.corp.xy.com
172.16.176.0|1|xy1010050035012.corp.xy.com
172.16.176.1|2047|xy1010050035014.corp.xy.com
172.16.184.0|2048|xy1010050035019.corp.xy.com
172.16.192.0|1|xy1010050035008.corp.xy.com
172.16.192.1|6143|xy1010050035011.corp.xy.com
172.16.216.0|1|xy1dra658.corp.xy.com
172.16.216.1|2047|xy1010050035014.corp.xy.com
172.16.224.0|1|xy1010050035012.corp.xy.com
172.16.224.1|1|xy1010050035010.corp.xy.com
172.16.224.2|2046|xy1010050035012.corp.xy.com
172.16.232.0|2048|xy1010050034205.corp.xy.com
172.16.240.0|2048|xy1010050035012.corp.xy.com
172.16.248.0|1|xy1dra655.corp.xy.com
172.16.248.1|2047|xy1010050035014.corp.xy.com
thanks for sharing logs
Quick question, we are planning to have coreDNS listen on hostnetwork instead of overlay IP as the offending source address has always been associated with coreDNS pod ip. Do you think it may help?
Don't see any reason why this problem could be particular to coreDNS. Might happen to any pod-to-pod communication over overlay network.
Our Dev cluster has cluster cidr 172.200.0.0/24 while stage and prod are 172.16.0.0/16. We have never encountered this issue in dev, while everything else remains the same. Do you think this can be a contributing factor?
On the contrary to your observation, 172.200.0.0/24 is not a RFC1918 private IP address I would expect that might cause problem.
As far as i have seen martian packets are typically result of routing misconfigurations, Weave-net does very little routing configuration as it deals with L2 switching. So its hard to guess what could be the contributing factor. Even harder to reproduce unfortunately.
On the contrary to your observation, 172.200.0.0/24 is not a RFC1918 private IP address I would expect that might cause problem.
Good point. Yes, you are right, this was our first dev cluster :)
Don't see any reason why this problem could be particular to coreDNS. Might happen to any pod-to-pod communication over overlay network.
Agreed. However, whenever we saw this issue, the offending pods were coreDNS and on 19th the last time we got hit by the issue, stage, as well as pod, exhibited the same problem and the pods were coreDNS, so I think there might be some correlation
As far as i have seen martian packets are typically result of routing misconfigurations
Any other detail might help to troubleshoot this?
@paphillon in the middle of your logs is a list of IPs which starts with the gateway address on the bridge, but then includes some peer IPs. I am puzzled how this comes about.
To check, could you run ip addr show dev weave
on that host and share what comes back.
@bboreham Thanks! Yes, that jumped out for me too, but didn't have much information to say if that is expected or not.
Here is the o/p of the ip addr show command as you requested for the host
ea:a4:90:36:8c:e4(xy1dra658.corp.xy.com)
10: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1337 qdisc noqueue state UP group default qlen 1000
link/ether ea:a4:90:36:8c:e4 brd ff:ff:ff:ff:ff:ff
inet 172.16.216.0/16 brd 172.16.255.255 scope global weave
valid_lft forever preferred_lft forever
inet6 fe80::e8a4:90ff:fe36:8ce4/64 scope link
valid_lft forever preferred_lft forever
For another worker host, seems similar o/p.
10: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1337 qdisc noqueue state UP group default qlen 1000
link/ether c2:e7:e5:c4:3c:2d brd ff:ff:ff:ff:ff:ff
inet 172.16.148.0/16 brd 172.16.255.255 scope global weave
valid_lft forever preferred_lft forever
inet6 fe80::c0e7:e5ff:fec4:3c2d/64 scope link
valid_lft forever preferred_lft forever
@bboreham Do you think this issue may be related to #3620?
Do you have that symptom?
@bboreham - I am not sure if we have the same issue, is there a command that I can run to find out if there are same IP's assigned to a pod/node? During our setup, we did remove and readd a couple of nodes.
To reproduce the martian error we tried to block all incoming and outgoing traffic on the node that is running coreDNS and by doing so we noticed the martian errors being logged. However, when the traffic was allowed back again the errors stopped so it was a partial success in reproducing the error. For now, we moved coreDNS to hostnetwork, repeated the above test and we didn't notice any martian errors in the logs.
You can get pod IPs via kubectl get pods -o wide
.
I can’t think of a way to get the node gateway addresses short of visiting each node and running ip addr show dev weave
Pod IP's. Did not find node / pod IP conflict though.
[user@xy1010050035017 ~]$ kubectl get pods -o wide --all-namespaces | grep 172.16
app-admin app-admin-client-0 1/1 Running 0 4d1h 172.16.88.14 xy1dra658.corp.xy.com <none>
app-admin app-admin-client-1 1/1 Running 0 4d2h 172.16.144.21 xy1dra655.corp.xy.com <none>
app-admin app-admin-client-2 1/1 Running 0 4d2h 172.16.104.14 xy1dra656.corp.xy.com <none>
app-core app-kafka-console-admin-0 1/1 Running 0 4d2h 172.16.144.22 xy1dra655.corp.xy.com <none>
app-core app-monitor-6f7d4549bb-6tqhh 2/2 Running 0 4d1h 172.16.88.12 xy1dra658.corp.xy.com <none>
app-pub app-rest-pub-0 2/2 Running 0 4d2h 172.16.104.13 xy1dra656.corp.xy.com <none>
app-pub app-rest-pub-1 2/2 Running 0 4d1h 172.16.88.13 xy1dra658.corp.xy.com <none>
app-pub app-rest-pub-2 2/2 Running 0 4d2h 172.16.144.20 xy1dra655.corp.xy.com <none>
kube-system heapster-79649856bb-zhwhc 1/1 Running 0 4d3h 172.16.136.4 xy1010050035008.corp.xy.com <none>
kube-system kubernetes-dashboard-7b5f4695c4-nn598 1/1 Running 0 4d1h 172.16.88.11 xy1dra658.corp.xy.com <none>
kube-system monitoring-influxdb-54594499c5-h8658 1/1 Running 0 4d2h 172.16.16.23 xy1010050035010.corp.xy.com <none>
kube-system tiller-deploy-57fdc789bd-zqhwm 1/1 Running 0 4d3h 172.16.48.16 xy1010050035009.corp.xy.com <none>
monitoring alertmanager-7bd44bfc96-6v6h9 1/1 Running 0 4d2h 172.16.16.25 xy1010050035010.corp.xy.com <none>
monitoring grafana-747f9bf496-stj9s 2/2 Running 0 4d4h 172.16.64.18 xy1010050035007.corp.xy.com <none>
monitoring kube-state-metrics-794cbf686b-b84mc 4/4 Running 0 4d3h 172.16.48.17 xy1010050035009.corp.xy.com <none>
monitoring monitoring.kubewatch-6bc78f8bb4-m4f7f 1/1 Running 0 4d2h 172.16.16.24 xy1010050035010.corp.xy.com <none>
monitoring node-problem-detector-6c68w 1/1 Running 3 124d 172.16.64.17 xy1010050035007.corp.xy.com <none>
monitoring node-problem-detector-7qllk 1/1 Running 3 124d 172.16.40.3 xy1010050034200.corp.xy.com <none>
monitoring node-problem-detector-c4sjz 1/1 Running 5 124d 172.16.160.6 xy1dra657.corp.xy.com <none>
monitoring node-problem-detector-d2pll 1/1 Running 3 124d 172.16.192.6 xy1010050035011.corp.xy.com <none>
monitoring node-problem-detector-gwzdv 1/1 Running 4 124d 172.16.72.4 xy1010050034204.corp.xy.com <none>
monitoring node-problem-detector-k7whw 1/1 Running 3 124d 172.16.224.7 xy1010050035012.corp.xy.com <none>
monitoring node-problem-detector-ks7qd 1/1 Running 3 124d 172.16.232.4 xy1010050034205.corp.xy.com <none>
monitoring node-problem-detector-n5k9j 1/1 Running 5 124d 172.16.88.10 xy1dra658.corp.xy.com <none>
monitoring node-problem-detector-nmv5c 1/1 Running 6 124d 172.16.104.12 xy1dra656.corp.xy.com <none>
monitoring node-problem-detector-nnbdq 1/1 Running 4 124d 172.16.16.22 xy1010050035010.corp.xy.com <none>
monitoring node-problem-detector-nx5gj 1/1 Running 3 124d 172.16.48.14 xy1010050035009.corp.xy.com <none>
monitoring node-problem-detector-qlj6v 1/1 Running 3 124d 172.16.0.5 xy1010050035014.corp.xy.com <none>
monitoring node-problem-detector-qs9g8 1/1 Running 3 124d 172.16.136.3 xy1010050035008.corp.xy.com <none>
monitoring node-problem-detector-w6qn5 1/1 Running 3 124d 172.16.184.4 xy1010050035019.corp.xy.com <none>
monitoring node-problem-detector-whr9h 1/1 Running 5 124d 172.16.144.18 xy1dra655.corp.xy.com <none>
monitoring prometheus-7ffbf96956-d22tx 2/2 Running 0 4d3h 172.16.144.19 xy1dra655.corp.xy.com <none>
Node ip's (dev weave) The only odd one's are four that are ending with .1 and one with .2, is that expected?
xy1dra655.corp.xy.com 172.16.248.0/16
xy1dra656.corp.xy.com 172.16.120.0/16
xy1dra657.corp.xy.com 172.16.160.0/16
xy1dra658.corp.xy.com 172.16.216.0/16
xy1010050034200.corp.xy.com 172.16.80.0/16
xy1010050034204.corp.xy.com 172.16.72.0/16
xy1010050034205.corp.xy.com 172.16.232.0/16
xy1010050035007.corp.xy.com 172.16.64.1/16
xy1010050035008.corp.xy.com 172.16.192.0/16
xy1010050035009.corp.xy.com 172.16.32.0/16
xy1010050035010.corp.xy.com 172.16.224.1/16
xy1010050035019.corp.xy.com 172.16.0.2/16
xy1010050035011.corp.xy.com 172.16.192.1/16
xy1010050035012.corp.xy.com 172.16.224.0/16
xy1010050035014.corp.xy.com 172.16.64.0/16
I do see them contributing to the martian errors
Apr 6 21:03:30 xy1010050035011 kernel: IPv4: martian source 172.16.64.1 from 172.16.64.19, on dev datapath
Apr 6 21:06:16 xy1010050035008 kernel: IPv4: martian source 172.16.64.19 from 172.16.192.1, on dev datapath
@bboreham - We had a network outage today for briefly 10 mins where some nodes were not able to communicate with each other or were really slow and the result was same.. martian errors from pods running on two nodes with weave addresses. The only way to solve it was to drain those nodes and reboot.
Same issue here.
We have a node that cannot communicate with pods on other nodes, and tcp connections also fails with No route to host
.
This issue happens 30~60+mins after host boot, and it's unpredictable. The only way to solve it is reboot.
Weavenet version: 2.6.1
Kubernetes version: 1.17.3
Host: ubuntu 18.04 LTS
Kernel: 4.15.0-88-generic
Kubernetes serviceSubnet
is 10.0.0.0/12
, weavenet subnet is the default 10.32.0.0/12
ping
other pods IP:
PING 10.42.0.10 (10.42.0.10) 56(84) bytes of data.
From 10.37.0.1 icmp_seq=1 Destination Host Unreachable
From 10.37.0.1 icmp_seq=2 Destination Host Unreachable
Lots of martian logs:
[13225.481477] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13225.485221] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13226.492261] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13226.495105] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13227.516292] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13227.520102] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13229.116534] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13229.118467] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13230.140568] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13230.144627] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13231.168156] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13231.170576] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13232.077072] IPv4: martian source 10.42.0.4 from 10.37.0.13, on dev eth0
[13232.077234] IPv4: martian source 10.32.0.58 from 10.37.0.13, on dev eth0
[13232.077631] IPv4: martian source 10.38.0.2 from 10.37.0.13, on dev eth0
[13232.077635] IPv4: martian source 10.41.0.23 from 10.37.0.13, on dev eth0
[13232.077639] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.077644] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.078089] IPv4: martian source 10.44.128.6 from 10.37.0.13, on dev eth0
[13232.078094] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.078484] IPv4: martian source 10.44.0.3 from 10.37.0.13, on dev eth0
[13232.078488] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.078883] IPv4: martian source 10.46.0.4 from 10.37.0.13, on dev eth0
[13232.078887] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.080465] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.106788] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13232.767115] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13232.771176] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13233.084547] IPv4: martian source 10.42.0.4 from 10.37.0.13, on dev eth0
[13233.084588] IPv4: martian source 10.46.0.4 from 10.37.0.13, on dev eth0
[13233.088171] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13233.089261] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.120759] net_ratelimit: 29 callbacks suppressed
[13237.120763] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13237.125024] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13237.180616] IPv4: martian source 10.42.0.4 from 10.37.0.13, on dev eth0
[13237.184236] IPv4: martian source 10.46.0.4 from 10.37.0.13, on dev eth0
[13237.185068] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.189588] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.189682] IPv4: martian source 10.44.128.6 from 10.37.0.13, on dev eth0
[13237.194244] IPv4: martian source 10.44.0.3 from 10.37.0.13, on dev eth0
[13237.197311] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.201721] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.203166] IPv4: martian source 10.41.0.23 from 10.37.0.13, on dev eth0
[13237.204637] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.206051] IPv4: martian source 10.38.0.2 from 10.37.0.13, on dev eth0
[13237.207453] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13237.209091] IPv4: martian source 10.32.0.58 from 10.37.0.13, on dev eth0
[13237.210725] ll header: 00000000: ff ff ff ff ff ff 72 8d 36 e8 84 fd 08 06 ......r.6.....
[13238.140459] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13238.143368] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
[13240.032970] IPv4: martian source 10.32.0.68 from 10.37.0.1, on dev datapath
[13240.036274] ll header: 00000000: ff ff ff ff ff ff a2 a9 9c 3c 7d b8 08 06 .........<}...
Full ip addr
outputs
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:ff:30:cf brd ff:ff:ff:ff:ff:ff
inet 10.245.146.207/24 brd 10.245.146.255 scope global dynamic ens3
valid_lft 309sec preferred_lft 309sec
inet6 fe80::5054:ff:feff:30cf/64 scope link
valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 3a:b6:9b:6c:2d:4f brd ff:ff:ff:ff:ff:ff
inet 10.6.101.153/32 brd 10.6.101.153 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.2.58.191/32 brd 10.2.58.191 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.3.217.234/32 brd 10.3.217.234 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.14.152.106/32 brd 10.14.152.106 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.6.206.54/32 brd 10.6.206.54 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.13.140.2/32 brd 10.13.140.2 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.4.39.249/32 brd 10.4.39.249 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.8.202.155/32 brd 10.8.202.155 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.1.153.71/32 brd 10.1.153.71 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.166.143/32 brd 10.10.166.143 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.3.85.6/32 brd 10.3.85.6 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.160.252/32 brd 10.0.160.252 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.2.138.242/32 brd 10.2.138.242 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.64.68/32 brd 10.5.64.68 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.12.106.25/32 brd 10.12.106.25 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.7.83.155/32 brd 10.7.83.155 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.4.15.144/32 brd 10.4.15.144 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.7.72.209/32 brd 10.7.72.209 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.64.80/32 brd 10.5.64.80 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.2.100.200/32 brd 10.2.100.200 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.12.224.11/32 brd 10.12.224.11 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.11.51.149/32 brd 10.11.51.149 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.15.44.169/32 brd 10.15.44.169 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.7.193.249/32 brd 10.7.193.249 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.253.200/32 brd 10.10.253.200 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.6.107.58/32 brd 10.6.107.58 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.173.80/32 brd 10.5.173.80 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.4.187.121/32 brd 10.4.187.121 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.4.2.55/32 brd 10.4.2.55 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.207.7/32 brd 10.5.207.7 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.1.253.133/32 brd 10.1.253.133 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.0.10/32 brd 10.0.0.10 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.13.37.201/32 brd 10.13.37.201 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.7.103.123/32 brd 10.7.103.123 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.13.1.179/32 brd 10.13.1.179 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.178.151/32 brd 10.10.178.151 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.12.164.220/32 brd 10.12.164.220 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.15.51.86/32 brd 10.15.51.86 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.15.0.41/32 brd 10.15.0.41 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.11.235.61/32 brd 10.11.235.61 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.2.186.122/32 brd 10.2.186.122 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.8.183.82/32 brd 10.8.183.82 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.0.1/32 brd 10.0.0.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.2.140.95/32 brd 10.2.140.95 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.14.37.97/32 brd 10.14.37.97 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.14.36.6/32 brd 10.14.36.6 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.14.244.186/32 brd 10.14.244.186 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.172.56/32 brd 10.10.172.56 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.117.144/32 brd 10.0.117.144 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.226.178/32 brd 10.5.226.178 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.6.156.9/32 brd 10.6.156.9 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.232.115/32 brd 10.0.232.115 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.7.95.46/32 brd 10.7.95.46 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.5.111.37/32 brd 10.5.111.37 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.3.120.120/32 brd 10.3.120.120 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.1.229.71/32 brd 10.1.229.71 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.1.150.145/32 brd 10.1.150.145 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.13.121.0/32 brd 10.13.121.0 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.124.154/32 brd 10.0.124.154 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.16.142/32 brd 10.10.16.142 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.6.56.245/32 brd 10.6.56.245 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.240.173/32 brd 10.10.240.173 scope global kube-ipvs0
valid_lft forever preferred_lft forever
4: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether be:be:1e:ed:ca:ef brd ff:ff:ff:ff:ff:ff
inet6 fe80::bcbe:1eff:feed:caef/64 scope link
valid_lft forever preferred_lft forever
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
link/ether a2:a9:9c:3c:7d:b8 brd ff:ff:ff:ff:ff:ff
inet 10.37.0.1/12 brd 10.47.255.255 scope global weave
valid_lft forever preferred_lft forever
inet6 fe80::a0a9:9cff:fe3c:7db8/64 scope link
valid_lft forever preferred_lft forever
8: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default
link/ether b2:bc:14:8f:c7:54 brd ff:ff:ff:ff:ff:ff
inet6 fe80::b0bc:14ff:fe8f:c754/64 scope link
valid_lft forever preferred_lft forever
9: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 36:96:ad:ae:12:17 brd ff:ff:ff:ff:ff:ff
inet6 fe80::3496:adff:feae:1217/64 scope link
valid_lft forever preferred_lft forever
10: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
link/ether fe:a9:2e:36:96:5b brd ff:ff:ff:ff:ff:ff
inet6 fe80::fca9:2eff:fe36:965b/64 scope link
valid_lft forever preferred_lft forever
12: vethwepl18713f4@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether f6:4f:18:a0:e0:e3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::f44f:18ff:fea0:e0e3/64 scope link
valid_lft forever preferred_lft forever
16: vethwepl2a4b84d@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 92:16:33:47:8c:20 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::9016:33ff:fe47:8c20/64 scope link
valid_lft forever preferred_lft forever
18: vethwepl8f96a10@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether ba:9d:49:e8:e7:ec brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::b89d:49ff:fee8:e7ec/64 scope link
valid_lft forever preferred_lft forever
20: vethwepl7189f3c@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether ae:84:51:c2:9e:3e brd ff:ff:ff:ff:ff:ff link-netnsid 4
inet6 fe80::ac84:51ff:fec2:9e3e/64 scope link
valid_lft forever preferred_lft forever
22: vethwepl933406f@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 3a:75:57:aa:b1:63 brd ff:ff:ff:ff:ff:ff link-netnsid 5
inet6 fe80::3875:57ff:feaa:b163/64 scope link
valid_lft forever preferred_lft forever
24: vethwepl0dd8eaf@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 9a:23:34:c6:60:66 brd ff:ff:ff:ff:ff:ff link-netnsid 6
inet6 fe80::9823:34ff:fec6:6066/64 scope link
valid_lft forever preferred_lft forever
26: vethweple1de9cf@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 16:be:df:fd:1e:90 brd ff:ff:ff:ff:ff:ff link-netnsid 7
inet6 fe80::14be:dfff:fefd:1e90/64 scope link
valid_lft forever preferred_lft forever
30: vethwepl253e15b@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 76:16:3b:2a:95:fd brd ff:ff:ff:ff:ff:ff link-netnsid 9
inet6 fe80::7416:3bff:fe2a:95fd/64 scope link
valid_lft forever preferred_lft forever
32: vethweplcbbd076@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether 76:65:9b:12:13:75 brd ff:ff:ff:ff:ff:ff link-netnsid 10
inet6 fe80::7465:9bff:fe12:1375/64 scope link
valid_lft forever preferred_lft forever
34: vethwepl994227f@if33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether d6:1b:21:2e:4c:5a brd ff:ff:ff:ff:ff:ff link-netnsid 11
inet6 fe80::d41b:21ff:fe2e:4c5a/64 scope link
valid_lft forever preferred_lft forever
36: vethwepld444910@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default
link/ether d2:a7:1c:67:d6:c0 brd ff:ff:ff:ff:ff:ff link-netnsid 12
inet6 fe80::d0a7:1cff:fe67:d6c0/64 scope link
valid_lft forever preferred_lft forever
sysctl --system
shows
* Applying /etc/sysctl.d/10-console-messages.conf ...
kernel.printk = 4 4 1 7
* Applying /etc/sysctl.d/10-ipv6-privacy.conf ...
* Applying /etc/sysctl.d/10-kernel-hardening.conf ...
kernel.kptr_restrict = 1
* Applying /etc/sysctl.d/10-link-restrictions.conf ...
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/10-lxd-inotify.conf ...
fs.inotify.max_user_instances = 1024
* Applying /etc/sysctl.d/10-magic-sysrq.conf ...
kernel.sysrq = 176
* Applying /etc/sysctl.d/10-network-security.conf ...
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_syncookies = 1
* Applying /etc/sysctl.d/10-ptrace.conf ...
kernel.yama.ptrace_scope = 1
* Applying /etc/sysctl.d/10-zeropage.conf ...
vm.mmap_min_addr = 65536
* Applying /usr/lib/sysctl.d/50-default.conf ...
net.ipv4.conf.all.promote_secondaries = 1
net.core.default_qdisc = fq_codel
* Applying /etc/sysctl.d/60-k8s.conf ...
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
* Applying /etc/sysctl.d/60-kernel-hardening.conf ...
kernel.kptr_restrict = 1
kernel.yama.ptrace_scope = 1
kernel.perf_event_paranoid = 2
kernel.randomize_va_space = 2
vm.mmap_min_addr = 65536
kernel.panic = 10
kernel.sysrq = 176
* Applying /etc/sysctl.d/60-net-mem-tune.conf ...
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
* Applying /etc/sysctl.d/60-net-misc.conf ...
net.ipv4.ip_forward = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 5
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 4096
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_mtu_probing = 1
net.ipv4.ip_no_pmtu_disc = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.all.proxy_ndp = 1
net.ipv6.conf.all.use_tempaddr = 0
net.ipv6.conf.default.use_tempaddr = 0
* Applying /etc/sysctl.d/60-sys-tune.conf ...
fs.inotify.max_user_watches = 524288
vm.swappiness = 0
vm.overcommit_memory = 1
fs.file-max = 51200
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 524288
kernel.printk = 4 4 1 7
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.conf ...
I'm dealing with this issue as well. We have Kubernetes clusters on AWS, Azure, and GCP and we're only seeing the martian source errors on GCP. It's also only these clusters that occasionally have network troubles on a single node that require restarting weave to fix. The logs are similar to previous posts.
[181603.734056] IPv4: martian source 172.16.160.12 from 172.16.248.0, on dev datapath
[181603.734059] ll header: 00000000: ff ff ff ff ff ff 6a 1c 4e 92 9b 93 08 06
[181603.734818] IPv4: martian source 172.16.248.12 from 172.16.160.12, on dev datapath
[181603.734848] ll header: 00000000: ff ff ff ff ff ff 3e d7 bd 6c 07 2d 08 06
[181603.820470] IPv4: martian source 172.16.160.0 from 172.16.160.2, on dev datapath
[181603.820474] ll header: 00000000: ff ff ff ff ff ff 5a bb 1e 31 5f 48 08 06
[181603.820568] IPv4: martian source 172.16.160.2 from 172.16.0.4, on dev datapath
[181603.820571] ll header: 00000000: ff ff ff ff ff ff 52 84 98 04 d9 11 08 06
[181609.911887] IPv4: martian source 172.16.72.2 from 172.16.0.4, on dev datapath
[181609.911925] ll header: 00000000: ff ff ff ff ff ff 52 84 98 04 d9 11 08 06
[181611.563568] IPv4: martian source 172.16.160.0 from 172.16.160.3, on dev datapath
[181611.563571] ll header: 00000000: ff ff ff ff ff ff 22 47 32 8c 5e f3 08 06
[181616.535650] IPv4: martian source 172.16.136.7 from 172.16.248.0, on dev datapath
[181616.535654] ll header: 00000000: ff ff ff ff ff ff 6a 1c 4e 92 9b 93 08 06
[181616.535749] IPv4: martian source 172.16.248.12 from 172.16.248.17, on dev datapath
[181616.535750] ll header: 00000000: ff ff ff ff ff ff da 6c 5c 7d 6e 40 08 06
[181622.566883] IPv4: martian source 172.16.16.0 from 172.16.248.17, on dev datapath
[181622.566913] ll header: 00000000: ff ff ff ff ff ff da 6c 5c 7d 6e 40 08 06
[181623.294772] IPv4: martian source 172.16.32.1 from 172.16.0.4, on dev datapath
[181623.294808] ll header: 00000000: ff ff ff ff ff ff 52 84 98 04 d9 11 08 06