flannel
flannel copied to clipboard
flannel not creating cni or veth intefaces on one node
I have a kubernetes environment created manually in vsphere with three worker nodes. I've installed flannel using the yaml from here:
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Except, I changed the backend from vxlan to host-gw. Vxkan didn't work in my environment, probably because I don't have distributed switching configured in the ESXi hosts.
Expected Behavior
Each node should have a cni0 interface with an IP address derived from the pod CIDR as well as several veth interfaces. Routing tables should update to reflect the routes to the pod CIDRs on the other nodes. Here's output from node #2:
_ifconfig
cni0 Link encap:Ethernet HWaddr ae:85:74:b8:f1:4e
inet addr:10.244.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::ac85:74ff:feb8:f14e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:308224 errors:0 dropped:0 overruns:0 frame:0
TX packets:296430 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24686474 (24.6 MB) TX bytes:1385790869 (1.3 GB)
ens160 Link encap:Ethernet HWaddr 00:50:56:aa:5f:2d
inet addr:10.180.11.195 Bcast:10.180.11.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feaa:5f2d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1250386 errors:0 dropped:175 overruns:0 frame:0
TX packets:418707 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1545089936 (1.5 GB) TX bytes:40886026 (40.8 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:194 errors:0 dropped:0 overruns:0 frame:0
TX packets:194 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)
veth04e19e3b Link encap:Ethernet HWaddr 02:28:ed:e3:64:37
inet6 addr: fe80::28:edff:fee3:6437/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:71015 errors:0 dropped:0 overruns:0 frame:0
TX packets:75305 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6143154 (6.1 MB) TX bytes:6847736 (6.8 MB)
veth40da2910 Link encap:Ethernet HWaddr da:d2:a8:33:90:0b
inet6 addr: fe80::d8d2:a8ff:fe33:900b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:237209 errors:0 dropped:0 overruns:0 frame:0
TX packets:221156 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:22858456 (22.8 MB) TX bytes:1378945435 (1.3 GB)_
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.0.0 10.180.11.194 255.255.255.0 UG 0 0 0 ens160 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.2.0 10.180.11.196 255.255.255.0 UG 0 0 0 ens160
And here's node 3:
_ifconfig
cni0 Link encap:Ethernet HWaddr c6:af:b6:bd:19:df
inet addr:10.244.2.1 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::c4af:b6ff:febd:19df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:276735 errors:0 dropped:0 overruns:0 frame:0
TX packets:314344 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:43575778 (43.5 MB) TX bytes:63961863 (63.9 MB)
ens160 Link encap:Ethernet HWaddr 00:50:56:aa:ab:b4
inet addr:10.180.11.196 Bcast:10.180.11.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feaa:abb4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:569322 errors:0 dropped:158 overruns:0 frame:0
TX packets:284413 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:494184506 (494.1 MB) TX bytes:346410379 (346.4 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:194 errors:0 dropped:0 overruns:0 frame:0
TX packets:194 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)
veth64075cea Link encap:Ethernet HWaddr 3a:64:8e:da:96:eb
inet6 addr: fe80::3864:8eff:feda:96eb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:78998 errors:0 dropped:0 overruns:0 frame:0
TX packets:89833 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:15257080 (15.2 MB) TX bytes:10282634 (10.2 MB)
veth9564535e Link encap:Ethernet HWaddr e6:44:18:02:cd:5f
inet6 addr: fe80::e444:18ff:fe02:cd5f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:197737 errors:0 dropped:0 overruns:0 frame:0
TX packets:224543 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:32192988 (32.1 MB) TX bytes:53681601 (53.6 MB)_
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.0.0 10.180.11.194 255.255.255.0 UG 0 0 0 ens160 10.244.1.0 10.180.11.195 255.255.255.0 UG 0 0 0 ens160 10.244.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
Current Behavior
Node one does not cni0 or veth interfaces, however it's routing table has the routes to the other nodes:
_ens160 Link encap:Ethernet HWaddr 00:50:56:aa:f4:35
inet addr:10.180.11.194 Bcast:10.180.11.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feaa:f435/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:477072 errors:0 dropped:168 overruns:0 frame:0
TX packets:212542 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:452413052 (452.4 MB) TX bytes:286003380 (286.0 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:194 errors:0 dropped:0 overruns:0 frame:0
TX packets:194 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)_
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.1.0 10.180.11.195 255.255.255.0 UG 0 0 0 ens160 10.244.2.0 10.180.11.196 255.255.255.0 UG 0 0 0 ens160
Possible Solution
Unknown
Steps to Reproduce (for bugs)
- kubelet.service on node 1: _Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service
[Service]
ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.194
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--network-plugin=cni
--pod-cidr=10.244.0.0/24
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.194.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.194-key.pem
--v=2
Restart=on-failure
RestartSec=5
[Install] WantedBy=multi-user.target_
- kubelet.service on node 2
_[Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service
[Service]
ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.195
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--network-plugin=cni
--pod-cidr=10.244.1.0/24
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.195.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.195-key.pem
--v=2
Restart=on-failure
RestartSec=5
[Install] WantedBy=multi-user.target_
- kubelet.service on node 3
_[Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service
[Service]
ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.196
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--network-plugin=cni
--pod-cidr=10.244.2.0/24
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.196.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.196-key.pem
--v=2
Restart=on-failure
RestartSec=5
[Install] WantedBy=multi-user.target_
- kube-contrller-manaager.service on master
_[Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-controller-manager
--address=0.0.0.0
--cluster-cidr=10.244.0.0/16
--cluster-name=kubernetes
--cluster-signing-cert-file=/var/lib/kubernetes/ca.pem
--cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem
--leader-elect=true
--master=http://127.0.0.1:8080
--root-ca-file=/var/lib/kubernetes/ca.pem
--service-account-private-key-file=/var/lib/kubernetes/ca-key.pem
--service-cluster-ip-range=10.32.0.0/24
--v=2
Restart=on-failure
RestartSec=5
[Install] WantedBy=multi-user.target_
- The PodCIDR values weren't figured out from the --pod-cidr setting in the kubelet.service file, so I manually added them using"
kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'
Here's the output from the kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' command showing that the PodCIDRs are correct now:
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' 10.244.0.0/24 10.244.1.0/24 10.244.2.0/24
- Updated kube-flannel.yml to use host-gw:
net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } }
- Here's the /etc/cni/net.d/10-flannel.conflist which flannel created. It't identical on each node:
_{ "name": "cbr0", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } _
- And here's the subnet.env on the three nodes:
/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true
/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.1.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true
/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.2.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true
Context
Trying to enable networking between the three workers.
Your Environment
- Flannel version: flannel:v0.10.0-amd64
- Backend used (e.g. vxlan or udp): host-gw
- Etcd version: etcdctl version: 3.3.9
- Kubernetes version (if used): v1.11.3
- Operating System and version:Ubuntu 16.04.5 LTS
- Link to your project (optional):
I'd like to check etcd to see what the podCIDRs look like in there, but I don't know how to do that or if its even possible.
I do not know why this is happening, but I was able to reproduce this bug. Experimentally, I discovered that there is no configuration /etc/cni/net.d/* on the bad nodes. May be solution: Copy /etc/cni/net.d/* from master and manualy paste to bad nodes. Configs applyed immediately and you can test intercluster network.
I have one master and one worker node. For me, the cni0 interface is missing on the master node, while it is being created on the worker node. Flannel is running on both nodes and reports no errors but I cannot get any network traffic across the nodes using the overlay IPs because of the missing cni0 interface on the master node.
Every time what flannel not work on node I use this (on node):
mkdir -p /etc/cni/net.d
cd /etc/cni/net.d
# This is zipped cni config which must be deployed by flannel-pod, but
# wasnt deployed by unknow reason
cat << EOF | openssl base64 -d | xz -d > 10-flannel.conflist
/Td6WFoAAATm1rRGAgAhARwAAAAQz1jM4AEKAJFdAD2CgBccLouJnyT/6A8zPtZS
xLRFcjIbx3pn6UV/UpoPAEjPLRmPz8u5fwxtKGvSxeMWHNVeyJ2Vpb491DXaBjHk
hP/DcMJyv+4mJL330vZDjgFq9OUqbVG0Nx6n6BAMRfhEYAqrhEcyjIQJVsTAgWVi
ODNmTWnAm3vdSjAtesWbiM+PR2FP/IK0cGdsy1VvzDQAAAAAXN3PLZF7zbAAAa0B
iwIAAAkaa2KxxGf7AgAAAAAEWVo=
EOF
After this i see flannel-pods in Creating status
watch -n1 kubectl get pods --all-namespaces
@MrEcco This configuration is present under /etc/cni/net.d/10-flannel.conflist on my master node but still there is no cni0 interface.
I just noticed that /var/lib/cni/ does not exist on my master node. Shouldn't that be created by flannel?
Should. I have this in work cluster:
root@kube-master:/var/lib/cni# find .
./flannel
./flannel/<64_hex_symbols>
./flannel/<other_64_hex_symbols>
./networks
./networks/cbr0
./networks/cbr0/10.244.0.4
./networks/cbr0/last_reserved_ip.0
./networks/cbr0/10.244.0.5
Are you sure you turned off selinux? May be you use custom iptables policyes? Or this is problem with connection between datacenters? After https://github.com/coreos/flannel/issues/1039#issuecomment-435896167 are you see flannel pods in kube-system namespace? Nodes is resolvable by their hostnames?
I am running this on Ubuntu 18.04 which does not have selinux installed or enabled by default. I also did not add any iptables policies by my self.
I can see that for each node a flannel pod is running:
ubuntu@ip-172-33-1-142:~$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS IP NODE
kube-flannel-ds-amd64-knnmh 1/1 Running 0 172.33.1.142 ip-172-33-1-142
kube-flannel-ds-amd64-vqp2v 1/1 Running 0 172.33.1.188 ip-172-33-1-188
kube-flannel-ds-msgdj 1/1 Running 0 172.33.1.188 ip-172-33-1-188
kube-flannel-ds-xhjwk 1/1 Running 0 172.33.1.142 ip-172-33-1-142
The master (.142) and worker (.188) nodes can ping each other by IP and also by hostname.
On the master node there is no cni folder under /var/lib:
# on master node:
ubuntu@ip-172-33-1-142:~$ cd /var/lib/cni
-bash: cd: /var/lib/cni: No such file or directory
On the worker node the folder exists and has the flannel and network subfolders as in your find . output.
I made some progress on this today. I had only one pod running on the master and it was configured with hostNetwork: true. As soon as I set this to hostNetwork: false and redeployed the pod, flannel started to create the cni0 interface.
Now I have a cni0 interface on my master node, but I am unable to communicate across nodes using the overlay network.
My master has 10.244.0.0/24 while my worker node has 10.244.1.0/24. I can ping pods from my master node using the masters' overlay subnet (e.g. 10.244.0.x) and I can ping pods from my worker node using the worker node's overlay subnet (e.g. 10.244.1.x). But I cannot get any traffic (e.g pings or even HTTP) across the overlay network. So I cannot reach a pod's http server on the worker node from my master node using the overlay IP of the pod.
Solved that final issue too, the port 8472 was not open in my AWS security group which is needed for VXLAN.
hostNetwor
Hi, @lanoxx , I have the same issue, I have deleted /var/lib/cni folder and thought it will be regenrated when I start the cluster, but it is not the case, any advice how to recreated the cin interface?
It's ok, I have regenerated the folder /var/lib/cni and created the interface cni0 using the command:
kubeadm init only.
I‘ change the 10-flannel.conflist name to 10-flannel.conf,everything is working.
i met this problem also; it's looks everything is okay!
$ find .
.
./flannel
./networks
./networks/cbr0
./networks/cbr0/lock
./networks/cbr0/last_reserved_ip.0
./networks/k8s-pod-network
./networks/k8s-pod-network/lock
./networks/k8s-pod-network/last_reserved_ip.0
./networks/k8s-pod-network/10.244.1.14
./networks/k8s-pod-network/10.244.1.15
./networks/k8s-pod-network/10.244.1.16
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:73:bc:94 brd ff:ff:ff:ff:ff:ff
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:bf:06:f7:a0 brd ff:ff:ff:ff:ff:ff
5: veth4c74228@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether b2:1f:77:cd:0c:21 brd ff:ff:ff:ff:ff:ff link-netnsid 0
6: cali72ab19ef985@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
7: calib4fa419c46f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
8: calibd644a79066@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
9: cali19ce750e5bd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 4
10: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 0e:a3:65:be:4f:0e brd ff:ff:ff:ff:ff:ff
$ ls /etc/cni/net.d
10-canal.conflist 10-flannel.conflist calico-kubeconfig
i met this problem also; it's looks everything is okay!
... $ ls /etc/cni/net.d 10-canal.conflist 10-flannel.conflist calico-kubeconfig
@johnnylei, you cannot use more than one CNI plugin in same time: it spawn too much conflicts. Cleanup cluster from old CNI staff at first. Reboot each node after cleanup (simplest interface removing) and try again.
i met this problem also; it's looks everything is okay!
... $ ls /etc/cni/net.d 10-canal.conflist 10-flannel.conflist calico-kubeconfig@johnnylei, you cannot use more than one CNI plugin in same time: it spawn too much conflicts. Cleanup cluster from old CNI staff at first. Reboot each node after cleanup (simplest interface removing) and try again.
should i remove rm -f 10-canal.conflist calico-kubeconfig?
After system reboot flannel is unable to create cni0 interface causes all pods are in unknown state.
I was facing a similar issue- cni0 was missing in my kubernetes setup. Sharing the root cause i found here in case it gives any pointer to debug the issue in your environment.
In my environment I was using CRI-O as a container runtime. I tried kubeadm init on the first master node and it failed because of a haproxy configuration issue. So I did kubeadm reset on that node and when this command is done it outputs instructions saying /etc/cni/net.d/ won't be cleared by kubeadm reset so you need to manually delete it. And that's what I did - deleted the entire directory :-/. After resolving the haproxy configuration when i again did kubeadm init it went well and even flannel deployment went well (with flannel.1 interface showing in ip link), but cni0 was missing and for that reason, coredns pods were in ContainerCreating state forever.
The root cause for the issue was the deletion of /etc/cni/net.d because it contains two files 100-crio-bridge.conf and 200-loopback.conf that have cni0 configuration and those files were deleted as well. These configuration files were created at the time of crio runtime installation. I reinstalled crio runtime to restore these configuration files and then tried kubeadm init and I was able to see cni0 and the coredns containers were in the running state.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.