calico The k8s node lost network (drop all the packets) after the calico-node pod is manually restarted in eBPF mode

The k8s node lost network (drop all the packets) after the calico-node pod is manually restarted in eBPF mode

Open drummerglen opened this issue 11 months ago • 0 comments

Hi guys,

I was testing calico's ebpf mode in recent days, I found that when I manually killed the calico-node pod for reliability test, the node's network was lost and it could not respond to any external connections.

I have to login the node through external console (just like vmrc or ibmc) and use tc-s qdisc show dev to find that the number of drop packets is increasing. Only by restarting the node can the node network be restored to normal.

I have searched for a long time in calico official documentation, google and github. Have not found the answer when, this question is very tricky, if it happens in the production environment it will be a disaster!

please feel free to give advice! Orz

Expected Behavior

k8s node work normally after restart calico-node pod

Current Behavior

k8s node work normally after restart calico-node pod

Possible Solution

Steps to Reproduce (for bugs)

1.Make sure calico has enabled eBPF mode. Here is the felixconfiguration

2. Make sure all the daemonsets of calico-node work normally and the pod could see the client's real source ip

3.Kill calico-node pod manually and node lost network Obviously nodes become NotReady after a while.

4.Check the node network connectivity and eBPF qdisc clsact status Using command ping %gateway% and tc -s qdisc show dev ens160. It can not transmit any icmp packet and the number of drops growing rapidly.

Context

Just lab env, but if it happens in the production environment it will be a disaster!

Your Environment

Calico version: 3.24.5
Orchestrator version (e.g. kubernetes, mesos, rkt): v1.25.9 on-prem kubeadm kubernetes cluster with one master and two worker node
Operating System and version: Ubuntu 20.04.6
Link to your project (optional): None

Mar 23 '24 17:03 drummerglen

calico calico copied to clipboard

The k8s node lost network (drop all the packets) after the calico-node pod is manually restarted in eBPF mode

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

calico
calico copied to clipboard