flannel icon indicating copy to clipboard operation
flannel copied to clipboard

flannel not creating cni or veth intefaces on one node

Open sgmacdougall opened this issue 7 years ago • 18 comments
trafficstars

I have a kubernetes environment created manually in vsphere with three worker nodes. I've installed flannel using the yaml from here:

https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Except, I changed the backend from vxlan to host-gw. Vxkan didn't work in my environment, probably because I don't have distributed switching configured in the ESXi hosts.

Expected Behavior

Each node should have a cni0 interface with an IP address derived from the pod CIDR as well as several veth interfaces. Routing tables should update to reflect the routes to the pod CIDRs on the other nodes. Here's output from node #2:

_ifconfig cni0 Link encap:Ethernet HWaddr ae:85:74:b8:f1:4e
inet addr:10.244.1.1 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::ac85:74ff:feb8:f14e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:308224 errors:0 dropped:0 overruns:0 frame:0 TX packets:296430 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:24686474 (24.6 MB) TX bytes:1385790869 (1.3 GB)

ens160 Link encap:Ethernet HWaddr 00:50:56:aa:5f:2d
inet addr:10.180.11.195 Bcast:10.180.11.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feaa:5f2d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1250386 errors:0 dropped:175 overruns:0 frame:0 TX packets:418707 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1545089936 (1.5 GB) TX bytes:40886026 (40.8 MB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:194 errors:0 dropped:0 overruns:0 frame:0 TX packets:194 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)

veth04e19e3b Link encap:Ethernet HWaddr 02:28:ed:e3:64:37
inet6 addr: fe80::28:edff:fee3:6437/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:71015 errors:0 dropped:0 overruns:0 frame:0 TX packets:75305 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6143154 (6.1 MB) TX bytes:6847736 (6.8 MB)

veth40da2910 Link encap:Ethernet HWaddr da:d2:a8:33:90:0b
inet6 addr: fe80::d8d2:a8ff:fe33:900b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:237209 errors:0 dropped:0 overruns:0 frame:0 TX packets:221156 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:22858456 (22.8 MB) TX bytes:1378945435 (1.3 GB)_

Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.0.0 10.180.11.194 255.255.255.0 UG 0 0 0 ens160 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.2.0 10.180.11.196 255.255.255.0 UG 0 0 0 ens160

And here's node 3:

_ifconfig cni0 Link encap:Ethernet HWaddr c6:af:b6:bd:19:df
inet addr:10.244.2.1 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::c4af:b6ff:febd:19df/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:276735 errors:0 dropped:0 overruns:0 frame:0 TX packets:314344 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:43575778 (43.5 MB) TX bytes:63961863 (63.9 MB)

ens160 Link encap:Ethernet HWaddr 00:50:56:aa:ab:b4
inet addr:10.180.11.196 Bcast:10.180.11.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feaa:abb4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:569322 errors:0 dropped:158 overruns:0 frame:0 TX packets:284413 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:494184506 (494.1 MB) TX bytes:346410379 (346.4 MB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:194 errors:0 dropped:0 overruns:0 frame:0 TX packets:194 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)

veth64075cea Link encap:Ethernet HWaddr 3a:64:8e:da:96:eb
inet6 addr: fe80::3864:8eff:feda:96eb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:78998 errors:0 dropped:0 overruns:0 frame:0 TX packets:89833 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:15257080 (15.2 MB) TX bytes:10282634 (10.2 MB)

veth9564535e Link encap:Ethernet HWaddr e6:44:18:02:cd:5f
inet6 addr: fe80::e444:18ff:fe02:cd5f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:197737 errors:0 dropped:0 overruns:0 frame:0 TX packets:224543 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:32192988 (32.1 MB) TX bytes:53681601 (53.6 MB)_

Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.0.0 10.180.11.194 255.255.255.0 UG 0 0 0 ens160 10.244.1.0 10.180.11.195 255.255.255.0 UG 0 0 0 ens160 10.244.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0

Current Behavior

Node one does not cni0 or veth interfaces, however it's routing table has the routes to the other nodes:

_ens160 Link encap:Ethernet HWaddr 00:50:56:aa:f4:35
inet addr:10.180.11.194 Bcast:10.180.11.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feaa:f435/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:477072 errors:0 dropped:168 overruns:0 frame:0 TX packets:212542 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:452413052 (452.4 MB) TX bytes:286003380 (286.0 MB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:194 errors:0 dropped:0 overruns:0 frame:0 TX packets:194 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:15726 (15.7 KB) TX bytes:15726 (15.7 KB)_

Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.180.11.1 0.0.0.0 UG 0 0 0 ens160 10.180.11.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160 10.244.1.0 10.180.11.195 255.255.255.0 UG 0 0 0 ens160 10.244.2.0 10.180.11.196 255.255.255.0 UG 0 0 0 ens160

Possible Solution

Unknown

Steps to Reproduce (for bugs)

  1. kubelet.service on node 1: _Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service

[Service] ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.194
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--network-plugin=cni
--pod-cidr=10.244.0.0/24
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.194.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.194-key.pem
--v=2 Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target_

  1. kubelet.service on node 2

_[Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service

[Service] ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.195
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--network-plugin=cni
--pod-cidr=10.244.1.0/24
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.195.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.195-key.pem
--v=2 Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target_

  1. kubelet.service on node 3

_[Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=cri-containerd.service Requires=containerd.service

[Service] ExecStart=/usr/local/bin/kubelet
--node-ip=10.180.11.196
--allow-privileged=true
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/var/lib/kubernetes/ca.pem
--cloud-provider=
--cluster-dns=10.32.0.10
--cluster-domain=cluster.local
--container-runtime=remote
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
--image-pull-progress-deadline=2m
--kubeconfig=/var/lib/kubelet/kubeconfig
--network-plugin=cni
--pod-cidr=10.244.2.0/24
--register-node=true
--runtime-request-timeout=15m
--tls-cert-file=/var/lib/kubelet/10.180.11.196.pem
--tls-private-key-file=/var/lib/kubelet/10.180.11.196-key.pem
--v=2 Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target_

  1. kube-contrller-manaager.service on master

_[Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/kubernetes/kubernetes

[Service] ExecStart=/usr/local/bin/kube-controller-manager
--address=0.0.0.0
--cluster-cidr=10.244.0.0/16
--cluster-name=kubernetes
--cluster-signing-cert-file=/var/lib/kubernetes/ca.pem
--cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem
--leader-elect=true
--master=http://127.0.0.1:8080
--root-ca-file=/var/lib/kubernetes/ca.pem
--service-account-private-key-file=/var/lib/kubernetes/ca-key.pem
--service-cluster-ip-range=10.32.0.0/24
--v=2 Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target_

  1. The PodCIDR values weren't figured out from the --pod-cidr setting in the kubelet.service file, so I manually added them using"

kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'

Here's the output from the kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' command showing that the PodCIDRs are correct now:

kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' 10.244.0.0/24 10.244.1.0/24 10.244.2.0/24

  1. Updated kube-flannel.yml to use host-gw:

net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } }

  1. Here's the /etc/cni/net.d/10-flannel.conflist which flannel created. It't identical on each node:

_{ "name": "cbr0", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } _

  1. And here's the subnet.env on the three nodes:

/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true

/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.1.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true

/run/flannel # cat subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.2.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=true

Context

Trying to enable networking between the three workers.

Your Environment

  • Flannel version: flannel:v0.10.0-amd64
  • Backend used (e.g. vxlan or udp): host-gw
  • Etcd version: etcdctl version: 3.3.9
  • Kubernetes version (if used): v1.11.3
  • Operating System and version:Ubuntu 16.04.5 LTS
  • Link to your project (optional):

sgmacdougall avatar Sep 18 '18 17:09 sgmacdougall

I'd like to check etcd to see what the podCIDRs look like in there, but I don't know how to do that or if its even possible.

sgmacdougall avatar Sep 18 '18 22:09 sgmacdougall

I do not know why this is happening, but I was able to reproduce this bug. Experimentally, I discovered that there is no configuration /etc/cni/net.d/* on the bad nodes. May be solution: Copy /etc/cni/net.d/* from master and manualy paste to bad nodes. Configs applyed immediately and you can test intercluster network.

MrEcco avatar Oct 18 '18 16:10 MrEcco

I have one master and one worker node. For me, the cni0 interface is missing on the master node, while it is being created on the worker node. Flannel is running on both nodes and reports no errors but I cannot get any network traffic across the nodes using the overlay IPs because of the missing cni0 interface on the master node.

lanoxx avatar Nov 05 '18 13:11 lanoxx

Every time what flannel not work on node I use this (on node):

mkdir -p /etc/cni/net.d
cd /etc/cni/net.d
# This is zipped cni config which must be deployed by flannel-pod, but 
# wasnt deployed by unknow reason
cat << EOF | openssl base64 -d | xz -d > 10-flannel.conflist
/Td6WFoAAATm1rRGAgAhARwAAAAQz1jM4AEKAJFdAD2CgBccLouJnyT/6A8zPtZS
xLRFcjIbx3pn6UV/UpoPAEjPLRmPz8u5fwxtKGvSxeMWHNVeyJ2Vpb491DXaBjHk
hP/DcMJyv+4mJL330vZDjgFq9OUqbVG0Nx6n6BAMRfhEYAqrhEcyjIQJVsTAgWVi
ODNmTWnAm3vdSjAtesWbiM+PR2FP/IK0cGdsy1VvzDQAAAAAXN3PLZF7zbAAAa0B
iwIAAAkaa2KxxGf7AgAAAAAEWVo=
EOF

After this i see flannel-pods in Creating status

watch -n1 kubectl get pods --all-namespaces

MrEcco avatar Nov 05 '18 14:11 MrEcco

@MrEcco This configuration is present under /etc/cni/net.d/10-flannel.conflist on my master node but still there is no cni0 interface.

lanoxx avatar Nov 05 '18 15:11 lanoxx

I just noticed that /var/lib/cni/ does not exist on my master node. Shouldn't that be created by flannel?

lanoxx avatar Nov 05 '18 15:11 lanoxx

Should. I have this in work cluster:

root@kube-master:/var/lib/cni# find .
./flannel
./flannel/<64_hex_symbols>
./flannel/<other_64_hex_symbols>
./networks
./networks/cbr0
./networks/cbr0/10.244.0.4
./networks/cbr0/last_reserved_ip.0
./networks/cbr0/10.244.0.5

Are you sure you turned off selinux? May be you use custom iptables policyes? Or this is problem with connection between datacenters? After https://github.com/coreos/flannel/issues/1039#issuecomment-435896167 are you see flannel pods in kube-system namespace? Nodes is resolvable by their hostnames?

MrEcco avatar Nov 05 '18 16:11 MrEcco

I am running this on Ubuntu 18.04 which does not have selinux installed or enabled by default. I also did not add any iptables policies by my self.

I can see that for each node a flannel pod is running:

ubuntu@ip-172-33-1-142:~$ kubectl get pods -n kube-system -o wide
NAME                                  READY   STATUS     RESTARTS   IP             NODE
kube-flannel-ds-amd64-knnmh           1/1     Running    0          172.33.1.142   ip-172-33-1-142   
kube-flannel-ds-amd64-vqp2v           1/1     Running    0          172.33.1.188   ip-172-33-1-188   
kube-flannel-ds-msgdj                 1/1     Running    0          172.33.1.188   ip-172-33-1-188   
kube-flannel-ds-xhjwk                 1/1     Running    0          172.33.1.142   ip-172-33-1-142   

The master (.142) and worker (.188) nodes can ping each other by IP and also by hostname.

On the master node there is no cni folder under /var/lib:

# on master node:
ubuntu@ip-172-33-1-142:~$ cd /var/lib/cni
-bash: cd: /var/lib/cni: No such file or directory

On the worker node the folder exists and has the flannel and network subfolders as in your find . output.

lanoxx avatar Nov 06 '18 09:11 lanoxx

I made some progress on this today. I had only one pod running on the master and it was configured with hostNetwork: true. As soon as I set this to hostNetwork: false and redeployed the pod, flannel started to create the cni0 interface.

Now I have a cni0 interface on my master node, but I am unable to communicate across nodes using the overlay network.

My master has 10.244.0.0/24 while my worker node has 10.244.1.0/24. I can ping pods from my master node using the masters' overlay subnet (e.g. 10.244.0.x) and I can ping pods from my worker node using the worker node's overlay subnet (e.g. 10.244.1.x). But I cannot get any traffic (e.g pings or even HTTP) across the overlay network. So I cannot reach a pod's http server on the worker node from my master node using the overlay IP of the pod.

lanoxx avatar Nov 07 '18 13:11 lanoxx

Solved that final issue too, the port 8472 was not open in my AWS security group which is needed for VXLAN.

lanoxx avatar Nov 07 '18 13:11 lanoxx

hostNetwor

Hi, @lanoxx , I have the same issue, I have deleted /var/lib/cni folder and thought it will be regenrated when I start the cluster, but it is not the case, any advice how to recreated the cin interface?

ablaabiyad avatar Jan 13 '20 14:01 ablaabiyad

It's ok, I have regenerated the folder /var/lib/cni and created the interface cni0 using the command: kubeadm init only.

ablaabiyad avatar Jan 13 '20 15:01 ablaabiyad

I‘ change the 10-flannel.conflist name to 10-flannel.conf,everything is working.

tx19980520 avatar Feb 12 '20 03:02 tx19980520

i met this problem also; it's looks everything is okay!

$ find .
.
./flannel
./networks
./networks/cbr0
./networks/cbr0/lock
./networks/cbr0/last_reserved_ip.0
./networks/k8s-pod-network
./networks/k8s-pod-network/lock
./networks/k8s-pod-network/last_reserved_ip.0
./networks/k8s-pod-network/10.244.1.14
./networks/k8s-pod-network/10.244.1.15
./networks/k8s-pod-network/10.244.1.16
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:73:bc:94 brd ff:ff:ff:ff:ff:ff
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:bf:06:f7:a0 brd ff:ff:ff:ff:ff:ff
5: veth4c74228@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether b2:1f:77:cd:0c:21 brd ff:ff:ff:ff:ff:ff link-netnsid 0
6: cali72ab19ef985@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
7: calib4fa419c46f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
8: calibd644a79066@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
9: cali19ce750e5bd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 4
10: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 0e:a3:65:be:4f:0e brd ff:ff:ff:ff:ff:ff
$ ls /etc/cni/net.d
10-canal.conflist  10-flannel.conflist  calico-kubeconfig

johnnylei avatar Mar 18 '20 16:03 johnnylei

i met this problem also; it's looks everything is okay!

...
$ ls /etc/cni/net.d
10-canal.conflist  10-flannel.conflist  calico-kubeconfig

@johnnylei, you cannot use more than one CNI plugin in same time: it spawn too much conflicts. Cleanup cluster from old CNI staff at first. Reboot each node after cleanup (simplest interface removing) and try again.

MrEcco avatar Mar 18 '20 20:03 MrEcco

i met this problem also; it's looks everything is okay!

...
$ ls /etc/cni/net.d
10-canal.conflist  10-flannel.conflist  calico-kubeconfig

@johnnylei, you cannot use more than one CNI plugin in same time: it spawn too much conflicts. Cleanup cluster from old CNI staff at first. Reboot each node after cleanup (simplest interface removing) and try again.

should i remove rm -f 10-canal.conflist calico-kubeconfig?

johnnylei avatar Mar 19 '20 06:03 johnnylei

After system reboot flannel is unable to create cni0 interface causes all pods are in unknown state.

codinja1188 avatar Mar 18 '21 07:03 codinja1188

I was facing a similar issue- cni0 was missing in my kubernetes setup. Sharing the root cause i found here in case it gives any pointer to debug the issue in your environment.

In my environment I was using CRI-O as a container runtime. I tried kubeadm init on the first master node and it failed because of a haproxy configuration issue. So I did kubeadm reset on that node and when this command is done it outputs instructions saying /etc/cni/net.d/ won't be cleared by kubeadm reset so you need to manually delete it. And that's what I did - deleted the entire directory :-/. After resolving the haproxy configuration when i again did kubeadm init it went well and even flannel deployment went well (with flannel.1 interface showing in ip link), but cni0 was missing and for that reason, coredns pods were in ContainerCreating state forever.

The root cause for the issue was the deletion of /etc/cni/net.d because it contains two files 100-crio-bridge.conf and 200-loopback.conf that have cni0 configuration and those files were deleted as well. These configuration files were created at the time of crio runtime installation. I reinstalled crio runtime to restore these configuration files and then tried kubeadm init and I was able to see cni0 and the coredns containers were in the running state.

vishnoianil avatar Mar 10 '22 00:03 vishnoianil

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 25 '23 20:01 stale[bot]