calico icon indicating copy to clipboard operation
calico copied to clipboard

routes not being setup for daemonset pod on newly added nodes until the pod is restarted manually

Open shashankv02 opened this issue 5 years ago • 6 comments

When a new node is added to cluster (using kubeadm join), the daemonset pods start automatically. Sometimes one of these pods doesn't get proper pod network setup. Just deleting the pod and letting it getting recreated fixes the issue. The issue occurs randomly and is not consistently reproducible everytime.

# kubectl get pod --all-namespaces -owide | grep 10.102.162.224
healthbot              configmanager-f82n2                                0/1     Init:ImagePullBackOff   0          2d22h   10.244.85.4      10.102.162.224   <none>           <none>
healthbot              speaker-dgqbl                                      0/1     ImagePullBackOff        0          2d22h   10.102.162.224   10.102.162.224   <none>           <none>
healthbot              udf-farm-7zh8b                                     0/1     Init:ImagePullBackOff   0          2d22h   10.244.85.3      10.102.162.224   <none>           <none>
kube-system            calico-node-ssd6p                                  1/1     Running                 0          2d22h   10.102.162.224   10.102.162.224   <none>           <none>
kube-system            calicoctl                                          1/1     Running                 0          34m     10.102.162.224   10.102.162.224   <none>           <none>
kube-system            docker-registry-docker-registry-proxy-h4n5q        1/1     Running                 0          2d22h   10.244.85.1      10.102.162.224   <none>           <none>
kube-system            kube-proxy-f9nc2                                   1/1     Running                 0          2d22h   10.102.162.224   10.102.162.224   <none>           <none>

This node 10.102.162.224 is a newly joined node. Observe the three pods having the IPs - 10.244.85.1, 10.244.85.3 and 10.244.85.4. The pod with networking problem is the one with IP 10.244.85.1

# calicoctl get workloadendpoints -n kube-system
NAMESPACE     WORKLOAD                                           NODE             NETWORKS            INTERFACE
kube-system   calico-kube-controllers-5b55f5fcc5-nvfd9           10.102.161.189   10.244.202.197/32   cali0fee7c57f46
kube-system   coredns-6955765f44-6rfng                           10.102.161.189   10.244.202.194/32   cali2a348b98402
kube-system   coredns-6955765f44-xw47r                           10.102.161.189   10.244.202.196/32   cali2456c980bec
kube-system   docker-registry-docker-registry-86bd4bb577-8bbrd   10.102.161.189   10.244.202.198/32   cali3423c615441
kube-system   docker-registry-docker-registry-proxy-7sj4d        10.102.162.125   10.244.178.1/32     calibfa4a90919f
kube-system   docker-registry-docker-registry-proxy-8jtdd        10.102.161.201   10.244.38.193/32    cali26d291963b1
kube-system   docker-registry-docker-registry-proxy-bk5ts        10.102.161.189   10.244.202.202/32   calic0988a005de
kube-system   docker-registry-docker-registry-proxy-f66df        10.102.161.184   10.244.14.193/32    cali2aaf4a7c73b
kube-system   docker-registry-docker-registry-proxy-h4n5q        10.102.162.224   10.244.85.1/32      calid14f1515eca
kube-system   docker-registry-docker-registry-proxy-vp462        10.102.162.120   10.244.146.1/32     calicb508674e6d
kube-system   tiller-deploy-969865475-gp2zp                      10.102.161.189   10.244.202.195/32   cali7f8841ab762

calid14f1515eca is the calico interface assigned for the pod with IP 10.244.85.1

# ip route
default via 10.102.175.254 dev eth0
10.102.160.0/20 dev eth0  proto kernel  scope link  src 10.102.162.224
10.244.14.192/26 via 10.102.161.184 dev tunl0  proto bird onlink
10.244.38.192/26 via 10.102.161.201 dev tunl0  proto bird onlink
blackhole 10.244.85.0/26  proto bird
10.244.85.3 dev cali6eb5f2198f0  scope link
10.244.85.4 dev califee61023751  scope link
10.244.146.0/26 via 10.102.162.120 dev tunl0  proto bird onlink
10.244.178.0/26 via 10.102.162.125 dev tunl0  proto bird onlink
10.244.202.192/26 via 10.102.161.189 dev tunl0  proto bird onlink
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown

Observe 10.244.85.3, 10.244.85.4 have routes configured but not 10.244.85.1

No interface created on the host -

# ifconfig | grep cali6eb5f2198f0
cali6eb5f2198f0 Link encap:Ethernet  HWaddr ee:ee:ee:ee:ee:ee
# ifconfig | grep califee61023751
califee61023751 Link encap:Ethernet  HWaddr ee:ee:ee:ee:ee:ee
# ifconfig | grep calid14f1515eca

There are lot of Interface down log from calico-node pod in this node.

2020-12-03 09:19:55.980 [INFO][56] route_table.go 237: Queueing a resync of routing table. ipVersion=0x4
2020-12-03 09:19:55.981 [INFO][56] route_table.go 577: Syncing routes: adding new route. ifaceName="calid14f1515eca" ipVersion=0x4 targetCIDR=10.244.85.1/32
2020-12-03 09:19:55.981 [WARNING][56] route_table.go 604: Failed to add route error=network is down ifaceName="calid14f1515eca" ipVersion=0x4 targetCIDR=10.244.85.1/32
2020-12-03 09:19:55.981 [INFO][56] route_table.go 247: Trying to connect to netlink
2020-12-03 09:19:55.981 [INFO][56] route_table.go 361: Interface down, will retry if it goes up. ifaceName="calid14f1515eca" ipVersion=0x4
2020-12-03 09:19:55.982 [INFO][56] int_dataplane.go 967: Finished applying updates to dataplane. msecToApply=2.513651

# kubectl describe pod calico-node-ssd6p  | grep "Start Time"
Start Time:           Mon, 30 Nov 2020 02:52:39 -0800

# kubectl describe pod docker-registry-docker-registry-proxy-h4n5q | grep "Start Time"
Start Time:   Mon, 30 Nov 2020 02:52:39 -0800

# kubectl describe pod udf-farm-7zh8b -n healthbot  | grep "Start Time"
Start Time:   Mon, 30 Nov 2020 02:53:09 -0800

# kubectl describe pod configmanager-f82n2 -n healthbot  | grep "Start Time"
Start Time:   Mon, 30 Nov 2020 02:53:09 -0800

Observe calico-node and the problematic pod docker-registry-docker-registry-proxy-h4n5q started at almost same time whereas the pods that have no networking issues started ~30 seconds later. So I'm guessing there is a race condition in here between whether calico-node pod starts first or any other deamonset's pod starts first?

One difference between docker-registry-docker-registry-proxy daemonset and the other deamonsets which started bit later is that both calico-node and docker-registry-docker-registry-proxy are installed in kube-system namespace whereas others are installed in a different namespace. I'm not sure if this makes a difference.

# kubectl describe node 10.102.162.224 | grep CIDR
PodCIDR:                      10.244.40.0/21
PodCIDRs:                     10.244.40.0/21
# kubectl cluster-info dump | grep -m 1 cluster-cidr
                            "--cluster-cidr=10.244.0.0/16",
# kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
                            "--service-cluster-ip-range=10.96.0.0/12",
# kubectl get node
NAME             STATUS                     ROLES    AGE    VERSION
10.102.161.184   Ready                      <none>   3d     v1.17.2
10.102.161.189   Ready                      master   3d1h   v1.17.2
10.102.161.201   Ready,SchedulingDisabled   <none>   3d     v1.17.2
10.102.162.120   Ready,SchedulingDisabled   <none>   3d     v1.17.2
10.102.162.125   Ready                      <none>   3d     v1.17.2
10.102.162.224   Ready                      <none>   3d     v1.17.2

Just deleting this pod and letting k8s recreate it solves the networking issue.

Your Environment

  • Calico version v3.12.3

  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.17.2

  • Operating System and version: Ubuntu 16.04

shashankv02 avatar Dec 03 '20 11:12 shashankv02

Hi @shashankv02 , thanks for the diags. Have you tried using a more recent Calico version? There have been a lot of improvements to Felix since v3.12 that might have addressed this issue.

lmm avatar Dec 09 '20 23:12 lmm

Yeah, this is curious. The CNI plugin should be setting up that interface, and is the same component that should be allocating the IP address (which we see is working).

One thought - do you have any other CNI configurations besides the Calico config in /etc/cni/net.d on your hosts? Perhaps this pod is being launched with a different CNI plugin before Calico gets a chance to install its config?

If the host-side veth doesn't exist, I'm guessing it might be that another CNI plugin might have created it with a different name?

caseydavenport avatar Dec 10 '20 17:12 caseydavenport

@caseydavenport There is only one CNI.

# /etc/cni/net.d# ls -lrt
total 8
-rw------- 1 root root 2623 Nov 30 02:52 calico-kubeconfig
-rw-r--r-- 1 root root  533 Nov 30 02:52 10-calico.conflist

Is there any other diagnostic information I should collect that might be helpful?


@Imm Thanks for the suggestion. I will try more recent version of calico on a development cluster whenever possible. Do not want to upgrade anything on production clusters without making sure the exact issue is fixed on a later version to avoid new regressions or behaviour changes. As this is hard to reproduce, it is also hard to verify if the issue is fixed or not.

shashankv02 avatar Dec 14 '20 14:12 shashankv02

Been trying to recreate this by removing and adding nodes with a script. Reproduced the same issue i.e., cannot access the docker-registry-proxy pod at hostPort 5000 on new node but the root cause seems to be different this time. I see the interface and static routes have been setup properly on the new node this time. But the hostPort jptables rules are not written. The docker-registry-proxy pod listens on hostPort 5000.

On a working node:

# iptables -t nat -S | grep 5000
-A CNI-DN-9083734ff9e63d966eb7c -s 10.244.140.202/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-9083734ff9e63d966eb7c -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-9083734ff9e63d966eb7c -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.244.140.202:80
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"11fe14ea06bfb57b3f2be805c242488fb9c668f88f5d8d128d2f79b8238e0c4c\"" -m multiport --dports 5000 -j CNI-DN-9083734ff9e63d966eb7c
-A KUBE-SEP-I4W7JNC5C7ZGNZMS -p tcp -m tcp -j DNAT --to-destination 10.244.140.222:50001
-A KUBE-SEP-RBXXJB6X2JQO4EBC -p tcp -m tcp -j DNAT --to-destination 10.244.140.199:5000
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-SVC-NNGLEMTREGRS54DN

On newly joined node:

# iptables -t nat -S | grep 5000
-A KUBE-SEP-I4W7JNC5C7ZGNZMS -p tcp -m tcp -j DNAT --to-destination 10.244.140.222:50001
-A KUBE-SEP-RBXXJB6X2JQO4EBC -p tcp -m tcp -j DNAT --to-destination 10.244.140.199:5000
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-SVC-NNGLEMTREGRS54DN

Observe the missing NAT rules on the new node. Again deleting the pod and letting it recreate fixes the issue.

# kubectl delete pod docker-registry-docker-registry-proxy-q79sk -n kube-system
pod "docker-registry-docker-registry-proxy-q79sk" deleted


# iptables -t nat -S | grep 5000
-A CNI-DN-f2f6741aa0634f9d6a631 -s 10.244.34.72/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-f2f6741aa0634f9d6a631 -s 127.0.0.1/32 -p tcp -m tcp --dport 5000 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-f2f6741aa0634f9d6a631 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.244.34.72:80
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"23eb4dadcc65546f5a19342032734ca5544f9a9fde63ef7db6ae94084ff5fad6\"" -m multiport --dports 5000 -j CNI-DN-f2f6741aa0634f9d6a631
-A KUBE-SEP-I4W7JNC5C7ZGNZMS -p tcp -m tcp -j DNAT --to-destination 10.244.140.222:50001
-A KUBE-SEP-RBXXJB6X2JQO4EBC -p tcp -m tcp -j DNAT --to-destination 10.244.140.199:5000
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.103.168.132/32 -p tcp -m comment --comment "kube-system/docker-registry-docker-registry:registry cluster IP" -m tcp --dport 5000 -j KUBE-SVC-NNGLEMTREGRS54DN

shashankv02 avatar Dec 16 '20 06:12 shashankv02

@shashankv02 I think that sounds like a different issue than the original one reported? In your original description it appeared that the interface for the pod didn't exist at all on the host - is that correct?

The hostPort plugin is responsible for those rules, and we've seen issues with it in the past. I think there's a strong case for that being part of the problem here, and a similar case for us ditching the upstream plugin and implementing those rules ourselves in Calico.

caseydavenport avatar Mar 09 '21 17:03 caseydavenport

@caseydavenport Yeah, the symptom is similar i.e., not able to connect to a daemonset created pod on a newly joined node but the underlying cause seems to be different.

shashankv02 avatar Mar 09 '21 17:03 shashankv02

This issue is stale because it is kind/enhancement or kind/bug and has been open for 180 days with no activity.

github-actions[bot] avatar Jul 19 '25 06:07 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Aug 18 '25 06:08 github-actions[bot]