flannel
flannel copied to clipboard
Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument
Pod from different host cannot ping each others. Flannel logs as below:
I1018 17:58:53.498781 1 main.go:470] Determining IP address of default interface
I1018 17:58:53.499196 1 main.go:483] Using interface with name eth0 and address 172.28.249.156
I1018 17:58:53.499243 1 main.go:500] Defaulting external address to interface address (172.28.249.156)
I1018 17:58:53.517275 1 kube.go:130] Waiting 10m0s for node controller to sync
I1018 17:58:53.517332 1 kube.go:283] Starting kube subnet manager
I1018 17:58:54.517591 1 kube.go:137] Node controller sync successful
I1018 17:58:54.517652 1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - scarif-admin-2
I1018 17:58:54.517661 1 main.go:238] Installing signal handlers
I1018 17:58:54.517821 1 main.go:348] Found network config - Backend type: vxlan
I1018 17:58:54.517912 1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I1018 17:58:54.573370 1 main.go:295] Wrote subnet file to /run/flannel/subnet.env
I1018 17:58:54.573408 1 main.go:299] Running backend.
I1018 17:58:54.573427 1 main.go:317] Waiting for all goroutines to exit
I1018 17:58:54.573496 1 vxlan_network.go:56] watching for new subnet leases
**E1018 17:58:54.573780 1 vxlan_network.go:158] failed to add vxlanRoute (172.16.0.0/24 -> 172.16.0.0): invalid argument**
I1018 17:58:54.577620 1 ipmasq.go:75] Some iptables rules are missing; deleting and recreating rules
I1018 17:58:54.577673 1 ipmasq.go:97] Deleting iptables rule: -s 172.16.0.0/16 -d 172.16.0.0/16 -j RETURN
I1018 17:58:54.579324 1 ipmasq.go:97] Deleting iptables rule: -s 172.16.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1018 17:58:54.580870 1 ipmasq.go:97] Deleting iptables rule: ! -s 172.16.0.0/16 -d 172.16.1.0/24 -j RETURN
I1018 17:58:54.582349 1 ipmasq.go:97] Deleting iptables rule: ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE
I1018 17:58:54.583900 1 ipmasq.go:85] Adding iptables rule: -s 172.16.0.0/16 -d 172.16.0.0/16 -j RETURN
I1018 17:58:54.587553 1 ipmasq.go:85] Adding iptables rule: -s 172.16.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1018 17:58:54.591290 1 ipmasq.go:85] Adding iptables rule: ! -s 172.16.0.0/16 -d 172.16.1.0/24 -j RETURN
I1018 17:58:54.595032 1 ipmasq.go:85] Adding iptables rule: ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE
Your Environment
- Flannel version: 0.9
- Backend used (e.g. vxlan or udp): vxlan
- Etcd version:
- Kubernetes version (if used): 1.8
- Operating System and version: Centos 7.3 Docker 17.06
What I think is interesting is " E1018 17:58:54.573780 1 vxlan_network.go:158] failed to add vxlanRoute (172.16.0.0/24 -> 172.16.0.0): invalid argument "
Yes, that line is the smoking gun. What other nodes do you have? Can you output the flannel annotation you have on your nodes (something like kubectl get nodes -o yaml |grep flannel.alpha
).
Somehow, I think one of your nodes has a PublicIP of 172.16.0.0 which it shouldn't do. The 172.16/16 range should be reserved for the vxlan network.
I have a similar issue, same versions of flannel, k8s. Using vxlan, flannel is up and running, no errors in the logs (not even the error above).
kubeadm 1.8.1
k8s 1.8.0
flannel 0.9
ubuntu 16.04
docker 17.03ce
I've tried combinations of k8s as far back as 1.6 and flannel as far back as 0.8, all with the same results.
I'm able to connect pod <-> pod and host <-> pod as long as the pods are on that host. All hosts can communicate with each other without issues. I've spent almost a month fiddling with iptables, routes, etc and cannot figure this out. I'm seeing traffic via tcpdump on the cni0 bridge, but my pods aren't getting it. IIRC, last night I was using iptstate and was seeing udp traffic on the bridge when I expected tcp. Maybe this is the issue? It's also possible I was seeing something else...
Should I open another ticket, or piggy back on this one?
I'm running into the same issue it seems.
I1026 22:38:06.797811 208 vxlan_network.go:56] watching for new subnet leases
I1026 22:38:06.800429 208 ipmasq.go:75] Some iptables rules are missing; deleting and recreating rules
I1026 22:38:06.800450 208 ipmasq.go:97] Deleting iptables rule: -s 172.17.0.0/16 -d 172.17.0.0/16 -j RETURN
I1026 22:38:06.801507 208 ipmasq.go:97] Deleting iptables rule: -s 172.17.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1026 22:38:06.802527 208 ipmasq.go:97] Deleting iptables rule: ! -s 172.17.0.0/16 -d 172.17.9.0/24 -j RETURN
I1026 22:38:06.803535 208 ipmasq.go:97] Deleting iptables rule: ! -s 172.17.0.0/16 -d 172.17.0.0/16 -j MASQUERADE
I1026 22:38:06.804543 208 ipmasq.go:85] Adding iptables rule: -s 172.17.0.0/16 -d 172.17.0.0/16 -j RETURN
I1026 22:38:06.806706 208 ipmasq.go:85] Adding iptables rule: -s 172.17.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1026 22:38:06.808932 208 ipmasq.go:85] Adding iptables rule: ! -s 172.17.0.0/16 -d 172.17.9.0/24 -j RETURN
I1026 22:38:06.811148 208 ipmasq.go:85] Adding iptables rule: ! -s 172.17.0.0/16 -d 172.17.0.0/16 -j MASQUERADE
E1026 22:38:11.064786 208 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument
E1027 02:51:24.265565 208 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument
@tomdee none of my nodes have that as the public ip annotation (they're all correct).
I don't see a route for 172.17.0.0/24
on any of my hosts.
$ ip route
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.17.1.0/24 via 172.17.1.0 dev flannel.1 onlink
172.17.2.0/24 via 172.17.2.0 dev flannel.1 onlink
172.17.3.0/24 via 172.17.3.0 dev flannel.1 onlink
172.17.4.0/24 via 172.17.4.0 dev flannel.1 onlink
172.17.5.0/24 via 172.17.5.0 dev flannel.1 onlink
172.17.6.0/24 via 172.17.6.0 dev flannel.1 onlink
172.17.7.0/24 via 172.17.7.0 dev flannel.1 onlink
172.17.8.0/24 via 172.17.8.0 dev flannel.1 onlink
172.17.9.2 dev cali299270d87b6 scope link
172.17.9.3 dev calib63aee49779 scope link
172.17.9.4 dev cali12d4a061371 scope link
$ arp -a
...
? (172.17.0.0) at <incomplete> on flannel.1
...
Flannel logs
I1027 12:53:29.439503 166 vxlan_network.go:138] adding subnet: 172.17.0.0/24 PublicIP: 10.65.27.18 VtepMAC: 46:ee:d0:82:55:a4
I1027 12:53:29.439524 166 device.go:179] calling AddARP: 172.17.0.0, 46:ee:d0:82:55:a4
I1027 12:53:29.439591 166 device.go:156] calling AddFDB: <hostip>, 46:ee:d0:82:55:a4
E1027 12:53:29.439668 166 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument
I1027 12:53:29.439706 166 device.go:190] calling DelARP: 172.17.0.0, 46:ee:d0:82:55:a4
I1027 12:53:29.439751 166 device.go:168] calling DelFDB: <hostip>, 46:ee:d0:82:55:a4
I had this error too when transitioning from 1.7.5 to 1.8.2. A reboot solved this error for me. (for completenes: prior to this I deleted the fstab swap entry because kubelet requires that the system doesnt swap. Not sure If this is related)
@camflan please open a different issue. I suspect you just need "iptables -P FORWARD ACCEPT"
@jhorwit2 @senwangrockets I think the problem could be that you have the same IP range configured for your Docker bridge as you do for flannel. If you're using kubeadm, did you specify --pod-network-cidr 10.244.0.0/16
@tomdee that was my issue. Sorry I forgot to post after I realized that.
@tomdee Hi Tom,
I initialised my cluster with same kubeadm command
kubeadm init --pod-network-cidr 10.244.0.0/16
But Still in Flannel pods I see errors
E1210 07:10:45.198903 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument
I have 4 host cluster 2 of them works fine but other 2 fails to schedule container
Always in state of "ContainerCreating"
Errors which I see is
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.554032 58034 cni.go:250] Error while adding to cni network: "cni0" already has an IP address different from 10.244.3.1/24
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 kernel: device veth7b12c96f left promiscuous mode
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 NetworkManager[702]: <info> [1512898754.6477] device (veth7b12c96f): released from master device cni0
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.655974 58034 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "tomcat-d6b5b9647-prq9w_tomcat" network: "cni0" already has an IP address different from 10.244.3.1/24
Having the same problem. 4 nodes, 2 masters and 2 workers. the .167 and .168 are the workers and .167 is the one that's having issues adding the route.
Output of: kubectl get nodes -o yaml |grep flannel.alpha
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"d2:28:18:cd:1d:82"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.1.130.165
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"b6:67:12:1c:d9:c4"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.1.130.166
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"aa:e0:31:6e:d1:ef"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.1.130.167
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"16:13:d5:7c:c5:e2"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.1.130.168
Are the invalid gateway addresses treated as multicast address by linux?
The subnet allocation in flannel will skip the multicast addresses https://github.com/coreos/flannel/blob/master/subnet/config.go#L86-L93. But using the podCidr
allocated by "controller manager" not skip the first subnet.
@tomdee
@tomdee Hi Tom,
I initialised my cluster with same kubeadm command
kubeadm init --pod-network-cidr 10.244.0.0/16
But Still in Flannel pods I see errors
E1210 07:10:45.198903 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument
I have 4 host cluster 2 of them works fine but other 2 fails to schedule container
Always in state of "ContainerCreating"
Errors which I see is
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.554032 58034 cni.go:250] Error while adding to cni network: "cni0" already has an IP address different from 10.244.3.1/24 Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state Dec 10 01:39:14 kongapi-poc-db1 kernel: device veth7b12c96f left promiscuous mode Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state Dec 10 01:39:14 kongapi-poc-db1 NetworkManager[702]: <info> [1512898754.6477] device (veth7b12c96f): released from master device cni0 Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.655974 58034 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "tomcat-d6b5b9647-prq9w_tomcat" network: "cni0" already has an IP address different from 10.244.3.1/24
I am not sure if this will help, but you might want to delete all the network/bridge devices before initializing k8s again. I had similar issues but I destroyed and created new VMs which resolved my similar issue. However, the issues might not be the same.
After reading flannel documentation, it was not obvious to me that flannel works one cidr only. But after the change things are much better, although with other issues.
@senwangrockets @kumarganesh2814 ,I have the same problem. Have you solved it ?
I got the same problem here is how I resolved. I have a 1 master 2 worker nodes setup, all of them are VMs. they have fixed ip and hostnames in my local are network. master and 1 worker node is ok. 1 worker node has this problem.
when I see something like this: vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument
, I would log onto that machine and check the ip address of cni0, it could be a different address. you could delete the interface and let the cluster re-generate. but my side of the problem is that I realized the flannel.1
interface was not created.
so I delete the node, manually delete the associated pods from master, and did kubectl reset
on the problematic worker node. and rejoined. but the flannel.1
never appear. In the end, I deleted the node from master and and did a reset. Restart the vm, and join master just like normal, flannel.1
appeared. And I did a deployment on master. On the worker node, cni0 and veth
appeared.
TLDR: not sure whether it would work but: delete worker node from master, worker node kubectl reset
, clean up , Restart vm, join master node as normal.
I also faced this problem,this is because the network interface which flanneld use can't access each other,i use another network interface then sovled
mine so weird on this flannel.alpha.coreos.com/public-ip: 10.0.3.15. this is my master, now my master cannot ping others flannel. what is actually happened here and how to edit the flannel.alpha on my master?
kubectl get nodes -o yaml |grep flannel.alpha
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"16:cb:5c:78:57:cb"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 192.168.14.3
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"7e:1e:e8:f6:8f:77"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 192.168.14.4
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"06:cd:6a:ba:6b:54"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 10.0.3.15
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"96:71:0e:48:52:4d"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 192.168.14.2
check the flannel.1 is conflicted with the docker0's ip, if conflicted, change the subnet's ip range
check the flannel.1 is conflicted with the docker0's ip, if conflicted, change the subnet's ip range
sorry, to whom your answer go with?
@rthamrin i followed this question: "failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.