calico
calico copied to clipboard
IP tables version auto-detection doesn't always work
After upgrade of Canal from v3.19.1 to v3.22.0, the pod communication is broken for newly started pods. It is working only for pods that were running before the upgrade. The issue can be recovered by reboot of the node.
Expected Behavior
After CNI upgrade, the newly started pods should be able to communicate with any other pod.
Current Behavior
A pod started after the upgrade is not able to communicate with any other pod in the cluster.
For example, the pod A with IP 10.244.3.9 is trying to ping the pod B with IP 10.244.3.10. iptables trace on the host shows that the packet is dropped by the cali-from-wl-dispatch rule:
trace id 76f540ab ip raw PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip raw PREROUTING rule meta l4proto icmp ip daddr 10.244.3.10 counter packets 11 bytes 924 meta nftrace set 1 (verdict continue)
trace id 76f540ab ip raw PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip raw PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip mangle PREROUTING rule # xt_comment counter packets 528464 bytes 651457512 jump cali-PREROUTING (verdict jump cali-PREROUTING)
trace id 76f540ab ip mangle cali-PREROUTING rule # xt_comment counter packets 221165 bytes 17736879 jump cali-from-host-endpoint (verdict jump cali-from-host-endpoint)
trace id 76f540ab ip mangle cali-from-host-endpoint verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle cali-PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip nat PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump cali-PREROUTING (verdict jump cali-PREROUTING)
trace id 76f540ab ip nat cali-PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump cali-fip-dnat (verdict jump cali-fip-dnat)
trace id 76f540ab ip nat cali-fip-dnat verdict continue meta mark 0x00040000
trace id 76f540ab ip nat cali-PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump KUBE-SERVICES (verdict jump KUBE-SERVICES)
trace id 76f540ab ip nat KUBE-SERVICES verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip mangle FORWARD packet: iif "calie5da25cfd1a" oif "cali8efcee298dd" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 63 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip mangle FORWARD verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle FORWARD policy accept meta mark 0x00040000
trace id 76f540ab ip filter FORWARD packet: iif "calie5da25cfd1a" oif "cali8efcee298dd" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 63 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip filter FORWARD rule # xt_comment counter packets 11739 bytes 3728960 jump cali-FORWARD (verdict jump cali-FORWARD)
trace id 76f540ab ip filter cali-FORWARD rule # xt_comment counter packets 11739 bytes 3728960 # xt_MARK (verdict continue)
trace id 76f540ab ip filter cali-FORWARD rule iifname "cali*" # xt_comment counter packets 9905 bytes 813300 jump cali-from-wl-dispatch (verdict jump cali-from-wl-dispatch)
trace id 76f540ab ip filter cali-from-wl-dispatch rule # xt_comment # xt_comment counter packets 8053 bytes 676400 drop (verdict drop)
It is dropped by the last rule in the cali-from-wl-dispatch chain:
Chain cali-from-wl-dispatch (2 references)
pkts bytes target prot opt in out source destination
1634 135K cali-fw-cali0eacf697eec all -- cali0eacf697eec * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:jfxMCpNf8Nj1KM-K */
1666 138K cali-fw-cali9b778b31de0 all -- cali9b778b31de0 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:Z5r7QQDW3XChII0Z */
1529 111K cali-fw-caliec620394ae2 all -- caliec620394ae2 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:j5kopEPZZL4XyLX1 */
8054 676K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */
the chain cali-from-wl-dispatch does not seem to contain rules for the involved interface names, but contains rules for non-existing interface names:
$ ip route | grep cali
10.244.3.5 dev calib3c61c3cba9 scope link
10.244.3.9 dev calie5da25cfd1a scope link
10.244.3.10 dev cali8efcee298dd scope link
$ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:fc:08:ef brd ff:ff:ff:ff:ff:ff
altname enp0s3
altname ens3
inet 192.168.1.250/24 metric 1024 brd 192.168.1.255 scope global dynamic eth0
valid_lft 67408sec preferred_lft 67408sec
inet6 fe80::f816:3eff:fefc:8ef/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:7b:2a:96:2d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether d2:61:31:39:5c:29 brd ff:ff:ff:ff:ff:ff
inet 10.244.3.0/32 brd 10.244.3.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::d061:31ff:fe39:5c29/64 scope link
valid_lft forever preferred_lft forever
13: calib3c61c3cba9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-870bafee-f33a-9ff8-78d4-6ded80ae5f55
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
16: nodelocaldns: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether ba:d3:80:2a:6c:ae brd ff:ff:ff:ff:ff:ff
inet 169.254.20.10/32 brd 169.254.20.10 scope global nodelocaldns
valid_lft forever preferred_lft forever
inet6 fe80::b8d3:80ff:fe2a:6cae/64 scope link
valid_lft forever preferred_lft forever
20: calie5da25cfd1a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-507e2e66-63e7-5601-4b53-d29aaa78e282
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
21: cali8efcee298dd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-2ff23a01-36db-5e18-4caf-c01ebc59454f
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
iptables dump: iptables-dump.txt
There is no error in the calico-node logs, only a few INFO logs that might be related:
2022-03-15 12:13:00.734 [INFO][47] felix/route_table.go 1116: Failed to access interface because it doesn't exist. error=Link not found ifaceName="cali0eacf697eec" ifaceRegex="^cali.*" ipVersion=0x4 tableIndex=0
2022-03-15 12:13:00.734 [INFO][47] felix/route_table.go 1184: Failed to get interface; it's down/gone. error=Link not found ifaceName="cali0eacf697eec" ifaceRegex="^cali.*" ipVersion=0x4 tableIndex=0
calico-node logs: calico-node-logs.txt kube-flannel logs: kube-flannel-logs.txt
Steps to Reproduce (for bugs)
- Deploy a cluster with Canal v3.19.1
- Start a bunch of pods
- Upgrade Canal to to v3.22.0
- Start a new pod
- The new pod is not able to communicate with any other pods
Your Environment
- Calico version v3.22.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s v1.21.6
- Operating System and version: Flatcar Container Linux stable 3033.2.3 (kernel 5.10.102-flatcar)
iptables diff before node reboot (broken pod communication) vs after node reboot (issue resolved):
1c1
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
25,26c25,26
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
35d34
< -A PREROUTING -d 10.244.3.10/32 -p icmp -j TRACE
46,47c45,46
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
49c48
< :INPUT ACCEPT [1306446:184386765]
---
> :INPUT ACCEPT [0:0]
51c50
< :OUTPUT ACCEPT [1318109:136128290]
---
> :OUTPUT ACCEPT [0:0]
70,74c69,72
< :cali-fw-cali0eacf697eec - [0:0]
< :cali-fw-cali9b778b31de0 - [0:0]
< :cali-fw-caliec620394ae2 - [0:0]
< :cali-pri-_PTRGc0U-L5Kz7V6ERW - [0:0]
< :cali-pri-_u2Tn2rSoAPffvE7JO6 - [0:0]
---
> :cali-fw-calib3c61c3cba9 - [0:0]
> :cali-fw-calie5da25cfd1a - [0:0]
> :cali-pri-_hNSGmJYNT8uLIzxesP - [0:0]
> :cali-pri-kns.default - [0:0]
76,77c74,76
< :cali-pro-_PTRGc0U-L5Kz7V6ERW - [0:0]
< :cali-pro-_u2Tn2rSoAPffvE7JO6 - [0:0]
---
> :cali-pri-ksa.default.default - [0:0]
> :cali-pro-_hNSGmJYNT8uLIzxesP - [0:0]
> :cali-pro-kns.default - [0:0]
78a78
> :cali-pro-ksa.default.default - [0:0]
82,84c82,83
< :cali-tw-cali0eacf697eec - [0:0]
< :cali-tw-cali9b778b31de0 - [0:0]
< :cali-tw-caliec620394ae2 - [0:0]
---
> :cali-tw-calib3c61c3cba9 - [0:0]
> :cali-tw-calie5da25cfd1a - [0:0]
118,119d116
< -A KUBE-SERVICES -d 10.99.243.89/32 -p udp -m comment --comment "kube-system/kube-dns-upstream:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
< -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
121a119,120
> -A KUBE-SERVICES -d 10.99.243.89/32 -p udp -m comment --comment "kube-system/kube-dns-upstream:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
> -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
139,204c138,191
< -A cali-from-wl-dispatch -i cali0eacf697eec -m comment --comment "cali:jfxMCpNf8Nj1KM-K" -g cali-fw-cali0eacf697eec
< -A cali-from-wl-dispatch -i cali9b778b31de0 -m comment --comment "cali:Z5r7QQDW3XChII0Z" -g cali-fw-cali9b778b31de0
< -A cali-from-wl-dispatch -i caliec620394ae2 -m comment --comment "cali:j5kopEPZZL4XyLX1" -g cali-fw-caliec620394ae2
< -A cali-from-wl-dispatch -m comment --comment "cali:jP5qeZOJds6jo3_6" -m comment --comment "Unknown interface" -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:2XfnMIS0vRTpYprb" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:sh3xojT0_fvQdOLZ" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:ilzx_AHB5PSu6SMc" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-cali0eacf697eec -p udp -m comment --comment "cali:xREVbzdA7xxJ_SG_" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-cali0eacf697eec -p ipencap -m comment --comment "cali:tz6v7ZHtg61I9Q55" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:n2ZWa3lzF0n8PMX4" -j cali-pro-kns.kube-system
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:v5rWp8HwM2XFCZ1c" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:4EHK1l9fKWYf18OI" -j cali-pro-_u2Tn2rSoAPffvE7JO6
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:l5J-VdjvQtO3LrQk" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:0bFUwVAPKGeStwsl" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:QRJCIR98QzvvuyJq" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:ay9LQvb2kTFZOB85" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:ep8IpMUPxzsKay1U" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-cali9b778b31de0 -p udp -m comment --comment "cali:h792EEUG1ywehDav" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-cali9b778b31de0 -p ipencap -m comment --comment "cali:wK-n2UQ42UdcxEhT" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:usXqpuRNvvg8arve" -j cali-pro-kns.kube-system
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:zCuDlID2OkmUTc-N" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:MvzzXy8GFn60l8Da" -j cali-pro-_u2Tn2rSoAPffvE7JO6
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:83E5fZh7LW22uW10" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:VTLnDAOcQy2MZIOt" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:uzV2WeiGT17quDoH" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:eARjyNdXSAGkE7ms" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:KHjIF3x506U0IyzQ" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-caliec620394ae2 -p udp -m comment --comment "cali:ghu0c_u8iQi8tYiX" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-caliec620394ae2 -p ipencap -m comment --comment "cali:Il5cEDn7POq5E7rk" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:9zK_Jrii60k-rKjX" -j cali-pro-kns.kube-system
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:hnJMKh9MkGSFesMR" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:THcDORIKY6yGRJ6G" -j cali-pro-_PTRGc0U-L5Kz7V6ERW
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:tDQo0kJ41Y0e8caF" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:ikKzErH9dIIO-SiN" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-pri-kns.kube-system -m comment --comment "cali:zoH5gU6U55FKZxEo" -j MARK --set-xmark 0x10000/0x10000
< -A cali-pri-kns.kube-system -m comment --comment "cali:bcGRIJcyOS9dgBiB" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-pro-kns.kube-system -m comment --comment "cali:-50oJuMfLVO3LkBk" -j MARK --set-xmark 0x10000/0x10000
< -A cali-pro-kns.kube-system -m comment --comment "cali:ztVPKv1UYejNzm1g" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-to-wl-dispatch -o cali0eacf697eec -m comment --comment "cali:VPrTvboHvIFHBSt1" -g cali-tw-cali0eacf697eec
< -A cali-to-wl-dispatch -o cali9b778b31de0 -m comment --comment "cali:WHPbZiPfWHDgLKjB" -g cali-tw-cali9b778b31de0
< -A cali-to-wl-dispatch -o caliec620394ae2 -m comment --comment "cali:aGYX_fDzHdiEIwAG" -g cali-tw-caliec620394ae2
< -A cali-to-wl-dispatch -m comment --comment "cali:-SrWckbj6EyTFkUR" -m comment --comment "Unknown interface" -j DROP
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:7FFwvJFKdOzlIM__" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:amlY9V8ff-q8RYyQ" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:drGUdAs1y8Q3y5Ec" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:fsHIb1BNTAQBxGRg" -j cali-pri-kns.kube-system
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:dSm3fgX2L51Qb27W" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:VsxKf7fD__Ggcu4S" -j cali-pri-_u2Tn2rSoAPffvE7JO6
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:64mCahwSw9FfCjaG" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:MinQYCCQ9CUT5LmE" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:g1NLmquMZ1d3pkgJ" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:TkR9NEf_suBo3eij" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:CLxUE1I4XiSWbEZZ" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:afIWE8H6xc681NB5" -j cali-pri-kns.kube-system
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:8vkx1yUtqWSm3DOd" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:g4_eqvzr95kRNpJn" -j cali-pri-_u2Tn2rSoAPffvE7JO6
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:WIAKY9apNp3bkCL1" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:DuLVD27vb4qTIepy" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:nnBX6C-bhiIU4J8D" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:xtGCf2UVq3LdACdA" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:DiYMTT6TG0NG7NgN" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:oXYaSq02GB-_1_gP" -j cali-pri-kns.kube-system
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:5cDLKNsjx-YQds-p" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:G2bOfKMYjtePpJ-F" -j cali-pri-_PTRGc0U-L5Kz7V6ERW
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:pFt3_bmCuapd3U2C" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:FY4yzFeVB_x3EhUj" -m comment --comment "Drop if no profiles matched" -j DROP
---
> -A cali-from-wl-dispatch -i calib3c61c3cba9 -m comment --comment "cali:-iFqLLPLKdTdHT3y" -g cali-fw-calib3c61c3cba9
> -A cali-from-wl-dispatch -i calie5da25cfd1a -m comment --comment "cali:GxF3GskLi8lTypH5" -g cali-fw-calie5da25cfd1a
> -A cali-from-wl-dispatch -m comment --comment "cali:YyQsK4pOzpK5SJvx" -m comment --comment "Unknown interface" -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:bFkWuXTY4cHRsWCw" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:Yz28x-xlZn3iK1dm" -m conntrack --ctstate INVALID -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:hNHUX-t0brxvobsq" -j MARK --set-xmark 0x0/0x10000
> -A cali-fw-calib3c61c3cba9 -p udp -m comment --comment "cali:zcm2POu6Mo0LAV1k" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
> -A cali-fw-calib3c61c3cba9 -p ipencap -m comment --comment "cali:DwmLW-ADFbvCVMfq" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:AdEJ83cx93vOFgSp" -j cali-pro-kns.default
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:cjz_9Zh_TDJTWzWk" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:0oinIgeZuK--S8aG" -j cali-pro-ksa.default.default
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:0_TxaYf8xQRbbTfr" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:PcQjLe_fD4RDZOPL" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:DQTj2Ly76ps1vAfV" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:CXvRiR4x2LDFKt8A" -m conntrack --ctstate INVALID -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:WE5TOgqrYenB8dk5" -j MARK --set-xmark 0x0/0x10000
> -A cali-fw-calie5da25cfd1a -p udp -m comment --comment "cali:Fyu95QLIPxazayxx" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
> -A cali-fw-calie5da25cfd1a -p ipencap -m comment --comment "cali:IlO35GnXY-zk9pDG" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:8aFJF7iWEDcpQkmW" -j cali-pro-kns.kube-system
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:OFSNaS_SPjCLBu_h" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:RQN0NspAfySpAEOJ" -j cali-pro-_hNSGmJYNT8uLIzxesP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:TneHhNn7F7DX_fh9" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:alxKi3OQLkF6h1nM" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-pri-_hNSGmJYNT8uLIzxesP -m comment --comment "cali:k9ZghIA0HRR2xDY1" -m comment --comment "Profile ksa.kube-system.default ingress"
> -A cali-pri-kns.default -m comment --comment "cali:WMSw8BmYOknRHfsz" -m comment --comment "Profile kns.default ingress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pri-kns.default -m comment --comment "cali:z015TBt2tO4F28NC" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pri-kns.kube-system -m comment --comment "cali:J1TyxtHWd0qaBGK-" -m comment --comment "Profile kns.kube-system ingress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pri-kns.kube-system -m comment --comment "cali:QIB6k7eEKdIg73Jp" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pri-ksa.default.default -m comment --comment "cali:PrckJA84jX_kGp99" -m comment --comment "Profile ksa.default.default ingress"
> -A cali-pro-_hNSGmJYNT8uLIzxesP -m comment --comment "cali:WHw0aH5lHwGz91dL" -m comment --comment "Profile ksa.kube-system.default egress"
> -A cali-pro-kns.default -m comment --comment "cali:Vr81boRqq4V77Sg8" -m comment --comment "Profile kns.default egress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pro-kns.default -m comment --comment "cali:2CkTlvGj1F9ZRYXl" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pro-kns.kube-system -m comment --comment "cali:tgOR2S8DVHZW3F1M" -m comment --comment "Profile kns.kube-system egress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pro-kns.kube-system -m comment --comment "cali:HVEEtYPJsiGRXCIt" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pro-ksa.default.default -m comment --comment "cali:bUZzZcietq9v5Ybq" -m comment --comment "Profile ksa.default.default egress"
> -A cali-to-wl-dispatch -o calib3c61c3cba9 -m comment --comment "cali:6rW7D4-zGjk_2cCD" -g cali-tw-calib3c61c3cba9
> -A cali-to-wl-dispatch -o calie5da25cfd1a -m comment --comment "cali:r60LIQJKeOfFbGL5" -g cali-tw-calie5da25cfd1a
> -A cali-to-wl-dispatch -m comment --comment "cali:o-l5VddE4yTV5GL2" -m comment --comment "Unknown interface" -j DROP
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:ERriy_LNlpHE7Zpa" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:319XBxiHdVGHvs2I" -m conntrack --ctstate INVALID -j DROP
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:nChoIqPtK8-J0Tnh" -j MARK --set-xmark 0x0/0x10000
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:NSSJcrC8rQgyfE3o" -j cali-pri-kns.default
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:RpCToeb7ZQGnRoqQ" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:sj37TD8cnzxYCzNk" -j cali-pri-ksa.default.default
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:DJaOVT9ZF-RYc9gH" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:zyPfP22BuZI-2_kX" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:cUpiZgaRMdNMAzOu" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:Nz6EJpcT4V1-Dg4G" -m conntrack --ctstate INVALID -j DROP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:SKDInyWWm4fssZEv" -j MARK --set-xmark 0x0/0x10000
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:qM3GnqOh8tZhpwvH" -j cali-pri-kns.kube-system
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:OvJg8heW9IZj0DOC" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:49ySDN8EtQ1raSkw" -j cali-pri-_hNSGmJYNT8uLIzxesP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:lZeBMfryljPUjaof" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:ZD5L3VwHHdmAtnGC" -m comment --comment "Drop if no profiles matched" -j DROP
208,209c195,196
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
269a257,258
> -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
> -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
276,277d264
< -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
< -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
291c278
< # Completed on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
but contains rules for non-existing interface names:
This seems to suggest that calico/node is working with out-of-date information. Are you calico/node pods "Ready"?
Symptoms seem to indicate that Calico is missing updates from the data store (both to notify it of new pods, but also to notify it of old pods that no longer exist)
This seems to suggest that calico/node is working with out-of-date information. Are you calico/node pods "Ready"?
Yes, all calico-node pods were running fine - not reporting any issues, not even in logs.
It may be important to note that we managed to reproduce the issue only on Flatcar Container Linux. On other distributions the same scenario works fine.
v3.22.0 had a nasty bug where calico could end up with an outdated view of the world. Fixed by https://github.com/projectcalico/calico/pull/5665 in v3.22.1 - might be worth upgrading?
@lwr20 symtoms sound similar, but that bug was actually fixed in v3.21.3 and isn't present in v3.22
Thank you for the correction.
Right, I can also confirm that the issue is present on v3.22.1 as well.
If it helps I have a similar issue on AlmaLinux8.
DPlane Backend is also nftables and the veth Interfaces in calicos engine store are not anymore existing on the host.
If I remove calico as my NetworkPolicy provider the cluster recovers.
Chain cali-from-wl-dispatch (2 references)
pkts bytes target prot opt in out source destination
1634 135K cali-fw-cali0eacf697eec all -- cali0eacf697eec * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:jfxMCpNf8Nj1KM-K */
1666 138K cali-fw-cali9b778b31de0 all -- cali9b778b31de0 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:Z5r7QQDW3XChII0Z */
1529 111K cali-fw-caliec620394ae2 all -- caliec620394ae2 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:j5kopEPZZL4XyLX1 */
8054 676K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */
Just wanted to confirm that these rules do appear to be out of date, and don't match the pods running on the node.
Are you using calico/typha in this cluster?
I can see an event indicating that Felix receives a delete event for the pod:
2022-03-15 17:20:57.208 [INFO][47] felix/endpoint_mgr.go 668: Workload removed, deleting its chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/tmp-shell", EndpointId:"eth0"}
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 537: Queuing deletion of chain. chainName="cali-tw-calie5da25cfd1a" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 537: Queuing deletion of chain. chainName="cali-fw-calie5da25cfd1a" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/endpoint_mgr.go 545: Workload removed, deleting old state. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/tmp-shell", EndpointId:"eth0"}
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 506: Queueing update of chain. chainName="cali-from-wl-dispatch" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 591: Chain no longer referenced, marking it for removal chainName="cali-fw-calie5da25cfd1a"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 506: Queueing update of chain. chainName="cali-to-wl-dispatch" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 591: Chain no longer referenced, marking it for removal chainName="cali-tw-calie5da25cfd1a"
Chain cali-from-wl-dispatch (2 references) pkts bytes target prot opt in out source destination 1634 135K cali-fw-cali0eacf697eec all -- cali0eacf697eec * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:jfxMCpNf8Nj1KM-K */ 1666 138K cali-fw-cali9b778b31de0 all -- cali9b778b31de0 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:Z5r7QQDW3XChII0Z */ 1529 111K cali-fw-caliec620394ae2 all -- caliec620394ae2 * 0.0.0.0/0 0.0.0.0/0 [goto] /* cali:j5kopEPZZL4XyLX1 */ 8054 676K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */Just wanted to confirm that these rules do appear to be out of date, and don't match the pods running on the node.
That is correct, these rules appear to be out of date.
Are you using calico/typha in this cluster?
Typha is disabled in the cluster.
Rocky Linux release 8.5 (Green Obsidian) also seems to be affected
@rastislavs we have encountered the same problem (also using flatcar linux) and found that during the upgrade process something is writing rules to the iptables-legacy tables rather than the iptables-nft one (we got suspicious because of the warning for the presence of iptables-legacy tables and decided to take a closer look).
As adding rules via both the legacy and the nft iptables-interface might result in changes in the ordering of rules, we disabled the auto-discovery of the iptables-mode and set it statically to nft (by setting the environment variable FELIX_IPTABLESBACKEND to NFT). After disabling iptables-autodiscovery this way all subsequent upgrades did not break the CNI for us any longer.
Can you try that and give feedback?
@kllex thanks for the insight; Calico autodetection works by loading the iptables rules from both sources (iptables/nft) and comparing the number of rules that it sees. If it sees more rules in iptables-nft then it'll choose that, otherwise it'll use iptables-legacy. Would be good to know why that heuristic is failing; do your machines start up with no rules in either table, for example or are we hitting a different error? Perhaps RH8.5 has moved off iptables completely and is using nft natively (so we don't see any iptables-nft rules)?
@fasaxc we have not counted on startup but rather compared pre-calico-update vs post-calico-update vs post-reboot. At pre-calico-update there are a dozen rules in the iptables-legacy tables inserted for our node-local-dns setup and a few hundred in the iptables-nft tables with the usual kubernetes/calico rules.
After the calico-update there were about 80 additional rules in the legacy-tables which we usually see in the nft-tables. After rebooting the number of rules in the respective tables returned to the way it was before the calico-update.
Edit: I have started up a VM with the same image we are using for our control-planes/workers as well and it shows zero rules for either table.
Setting FELIX_IPTABLESBACKEND to NFT resolved the issue for us as well.
What are the potential risks of forcing NFT backend?
I have started up a VM with the same image we are using for our control-planes/workers as well and it shows zero rules for either table.
I think, in that case, we'd default to legacy mode. Must be a change in recent OS versions to ship with literally zero rules. Breaks our heuristic.
Hey, what is the status of this issue? Recently we managed to reproduce this issue also on Ubuntu 22.04
It appears that this is an issue where Calico incorrectly auto-detects the wrong iptables backend to use, which would suggest the simple solution of specifying explicitly which iptables version is in use on your hosts.
I'm not sure if there is an improvement to Calico's auto-detection that could also resolve this, but would be happy to hear any suggestions.
@caseydavenport I think the first thing we should try is to update auto-detection. As far as I see, the auto-detection code is located here: https://github.com/projectcalico/calico/blob/8f1ae212ef38e40c2b79c1cb743cd151a82bb45e/felix/environment/feature_detect.go#L286-L326
This code is based on the upstream code, however, the upstream code got updated since then. It is no longer checking are there 10 rules, but instead, it's checking whether legacy or NFT has more rules. I think we should try the same in Calico.
The upstream auto-detection logic is now located here: https://github.com/kubernetes/release/blob/cac877222c829854ed1ec343ec45e79ef1660f8f/images/build/debian-iptables/buster/iptables-wrapper#L19-L40