calico icon indicating copy to clipboard operation
calico copied to clipboard

IP tables version auto-detection doesn't always work

Open rastislavs opened this issue 3 years ago • 19 comments

After upgrade of Canal from v3.19.1 to v3.22.0, the pod communication is broken for newly started pods. It is working only for pods that were running before the upgrade. The issue can be recovered by reboot of the node.

Expected Behavior

After CNI upgrade, the newly started pods should be able to communicate with any other pod.

Current Behavior

A pod started after the upgrade is not able to communicate with any other pod in the cluster.

For example, the pod A with IP 10.244.3.9 is trying to ping the pod B with IP 10.244.3.10. iptables trace on the host shows that the packet is dropped by the cali-from-wl-dispatch rule:

trace id 76f540ab ip raw PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip raw PREROUTING rule meta l4proto icmp ip daddr 10.244.3.10 counter packets 11 bytes 924 meta nftrace set 1 (verdict continue)
trace id 76f540ab ip raw PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip raw PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip mangle PREROUTING rule # xt_comment counter packets 528464 bytes 651457512 jump cali-PREROUTING (verdict jump cali-PREROUTING)
trace id 76f540ab ip mangle cali-PREROUTING rule # xt_comment counter packets 221165 bytes 17736879 jump cali-from-host-endpoint (verdict jump cali-from-host-endpoint)
trace id 76f540ab ip mangle cali-from-host-endpoint verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle cali-PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING packet: iif "calie5da25cfd1a" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip nat PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump cali-PREROUTING (verdict jump cali-PREROUTING)
trace id 76f540ab ip nat cali-PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump cali-fip-dnat (verdict jump cali-fip-dnat)
trace id 76f540ab ip nat cali-fip-dnat verdict continue meta mark 0x00040000
trace id 76f540ab ip nat cali-PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING rule # xt_comment counter packets 8305 bytes 691157 jump KUBE-SERVICES (verdict jump KUBE-SERVICES)
trace id 76f540ab ip nat KUBE-SERVICES verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING verdict continue meta mark 0x00040000
trace id 76f540ab ip nat PREROUTING policy accept meta mark 0x00040000
trace id 76f540ab ip mangle FORWARD packet: iif "calie5da25cfd1a" oif "cali8efcee298dd" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 63 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip mangle FORWARD verdict continue meta mark 0x00040000
trace id 76f540ab ip mangle FORWARD policy accept meta mark 0x00040000
trace id 76f540ab ip filter FORWARD packet: iif "calie5da25cfd1a" oif "cali8efcee298dd" ether saddr 76:0e:b8:ba:fa:08 ether daddr ee:ee:ee:ee:ee:ee ip saddr 10.244.3.9 ip daddr 10.244.3.10 ip dscp cs0 ip ecn not-ect ip ttl 63 ip id 60250 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 58087 icmp sequence 1 @th,64,96 857356887478869055311578368
trace id 76f540ab ip filter FORWARD rule # xt_comment counter packets 11739 bytes 3728960 jump cali-FORWARD (verdict jump cali-FORWARD)
trace id 76f540ab ip filter cali-FORWARD rule # xt_comment counter packets 11739 bytes 3728960 # xt_MARK (verdict continue)
trace id 76f540ab ip filter cali-FORWARD rule iifname "cali*" # xt_comment counter packets 9905 bytes 813300 jump cali-from-wl-dispatch (verdict jump cali-from-wl-dispatch)
trace id 76f540ab ip filter cali-from-wl-dispatch rule # xt_comment # xt_comment counter packets 8053 bytes 676400 drop (verdict drop)

It is dropped by the last rule in the cali-from-wl-dispatch chain:

Chain cali-from-wl-dispatch (2 references)
 pkts bytes target     prot opt in     out     source               destination
 1634  135K cali-fw-cali0eacf697eec  all  --  cali0eacf697eec *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:jfxMCpNf8Nj1KM-K */
 1666  138K cali-fw-cali9b778b31de0  all  --  cali9b778b31de0 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:Z5r7QQDW3XChII0Z */
 1529  111K cali-fw-caliec620394ae2  all  --  caliec620394ae2 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:j5kopEPZZL4XyLX1 */
 8054  676K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */

the chain cali-from-wl-dispatch does not seem to contain rules for the involved interface names, but contains rules for non-existing interface names:

$ ip route | grep cali
10.244.3.5 dev calib3c61c3cba9 scope link
10.244.3.9 dev calie5da25cfd1a scope link
10.244.3.10 dev cali8efcee298dd scope link
$ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:fc:08:ef brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 192.168.1.250/24 metric 1024 brd 192.168.1.255 scope global dynamic eth0
       valid_lft 67408sec preferred_lft 67408sec
    inet6 fe80::f816:3eff:fefc:8ef/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:7b:2a:96:2d brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether d2:61:31:39:5c:29 brd ff:ff:ff:ff:ff:ff
    inet 10.244.3.0/32 brd 10.244.3.0 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::d061:31ff:fe39:5c29/64 scope link
       valid_lft forever preferred_lft forever
13: calib3c61c3cba9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-870bafee-f33a-9ff8-78d4-6ded80ae5f55
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
16: nodelocaldns: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether ba:d3:80:2a:6c:ae brd ff:ff:ff:ff:ff:ff
    inet 169.254.20.10/32 brd 169.254.20.10 scope global nodelocaldns
       valid_lft forever preferred_lft forever
    inet6 fe80::b8d3:80ff:fe2a:6cae/64 scope link
       valid_lft forever preferred_lft forever
20: calie5da25cfd1a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-507e2e66-63e7-5601-4b53-d29aaa78e282
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
21: cali8efcee298dd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-2ff23a01-36db-5e18-4caf-c01ebc59454f
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

iptables dump: iptables-dump.txt

There is no error in the calico-node logs, only a few INFO logs that might be related:

2022-03-15 12:13:00.734 [INFO][47] felix/route_table.go 1116: Failed to access interface because it doesn't exist. error=Link not found ifaceName="cali0eacf697eec" ifaceRegex="^cali.*" ipVersion=0x4 tableIndex=0
2022-03-15 12:13:00.734 [INFO][47] felix/route_table.go 1184: Failed to get interface; it's down/gone. error=Link not found ifaceName="cali0eacf697eec" ifaceRegex="^cali.*" ipVersion=0x4 tableIndex=0

calico-node logs: calico-node-logs.txt kube-flannel logs: kube-flannel-logs.txt

Steps to Reproduce (for bugs)

  1. Deploy a cluster with Canal v3.19.1
  2. Start a bunch of pods
  3. Upgrade Canal to to v3.22.0
  4. Start a new pod
  5. The new pod is not able to communicate with any other pods

Your Environment

  • Calico version v3.22.0
  • Orchestrator version (e.g. kubernetes, mesos, rkt): k8s v1.21.6
  • Operating System and version: Flatcar Container Linux stable 3033.2.3 (kernel 5.10.102-flatcar)

rastislavs avatar Mar 15 '22 22:03 rastislavs

iptables diff before node reboot (broken pod communication) vs after node reboot (issue resolved):

1c1
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
25,26c25,26
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
35d34
< -A PREROUTING -d 10.244.3.10/32 -p icmp -j TRACE
46,47c45,46
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
49c48
< :INPUT ACCEPT [1306446:184386765]
---
> :INPUT ACCEPT [0:0]
51c50
< :OUTPUT ACCEPT [1318109:136128290]
---
> :OUTPUT ACCEPT [0:0]
70,74c69,72
< :cali-fw-cali0eacf697eec - [0:0]
< :cali-fw-cali9b778b31de0 - [0:0]
< :cali-fw-caliec620394ae2 - [0:0]
< :cali-pri-_PTRGc0U-L5Kz7V6ERW - [0:0]
< :cali-pri-_u2Tn2rSoAPffvE7JO6 - [0:0]
---
> :cali-fw-calib3c61c3cba9 - [0:0]
> :cali-fw-calie5da25cfd1a - [0:0]
> :cali-pri-_hNSGmJYNT8uLIzxesP - [0:0]
> :cali-pri-kns.default - [0:0]
76,77c74,76
< :cali-pro-_PTRGc0U-L5Kz7V6ERW - [0:0]
< :cali-pro-_u2Tn2rSoAPffvE7JO6 - [0:0]
---
> :cali-pri-ksa.default.default - [0:0]
> :cali-pro-_hNSGmJYNT8uLIzxesP - [0:0]
> :cali-pro-kns.default - [0:0]
78a78
> :cali-pro-ksa.default.default - [0:0]
82,84c82,83
< :cali-tw-cali0eacf697eec - [0:0]
< :cali-tw-cali9b778b31de0 - [0:0]
< :cali-tw-caliec620394ae2 - [0:0]
---
> :cali-tw-calib3c61c3cba9 - [0:0]
> :cali-tw-calie5da25cfd1a - [0:0]
118,119d116
< -A KUBE-SERVICES -d 10.99.243.89/32 -p udp -m comment --comment "kube-system/kube-dns-upstream:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
< -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
121a119,120
> -A KUBE-SERVICES -d 10.99.243.89/32 -p udp -m comment --comment "kube-system/kube-dns-upstream:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
> -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
139,204c138,191
< -A cali-from-wl-dispatch -i cali0eacf697eec -m comment --comment "cali:jfxMCpNf8Nj1KM-K" -g cali-fw-cali0eacf697eec
< -A cali-from-wl-dispatch -i cali9b778b31de0 -m comment --comment "cali:Z5r7QQDW3XChII0Z" -g cali-fw-cali9b778b31de0
< -A cali-from-wl-dispatch -i caliec620394ae2 -m comment --comment "cali:j5kopEPZZL4XyLX1" -g cali-fw-caliec620394ae2
< -A cali-from-wl-dispatch -m comment --comment "cali:jP5qeZOJds6jo3_6" -m comment --comment "Unknown interface" -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:2XfnMIS0vRTpYprb" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:sh3xojT0_fvQdOLZ" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:ilzx_AHB5PSu6SMc" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-cali0eacf697eec -p udp -m comment --comment "cali:xREVbzdA7xxJ_SG_" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-cali0eacf697eec -p ipencap -m comment --comment "cali:tz6v7ZHtg61I9Q55" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:n2ZWa3lzF0n8PMX4" -j cali-pro-kns.kube-system
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:v5rWp8HwM2XFCZ1c" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:4EHK1l9fKWYf18OI" -j cali-pro-_u2Tn2rSoAPffvE7JO6
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:l5J-VdjvQtO3LrQk" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali0eacf697eec -m comment --comment "cali:0bFUwVAPKGeStwsl" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:QRJCIR98QzvvuyJq" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:ay9LQvb2kTFZOB85" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:ep8IpMUPxzsKay1U" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-cali9b778b31de0 -p udp -m comment --comment "cali:h792EEUG1ywehDav" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-cali9b778b31de0 -p ipencap -m comment --comment "cali:wK-n2UQ42UdcxEhT" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:usXqpuRNvvg8arve" -j cali-pro-kns.kube-system
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:zCuDlID2OkmUTc-N" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:MvzzXy8GFn60l8Da" -j cali-pro-_u2Tn2rSoAPffvE7JO6
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:83E5fZh7LW22uW10" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-cali9b778b31de0 -m comment --comment "cali:VTLnDAOcQy2MZIOt" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:uzV2WeiGT17quDoH" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:eARjyNdXSAGkE7ms" -m conntrack --ctstate INVALID -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:KHjIF3x506U0IyzQ" -j MARK --set-xmark 0x0/0x10000
< -A cali-fw-caliec620394ae2 -p udp -m comment --comment "cali:ghu0c_u8iQi8tYiX" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
< -A cali-fw-caliec620394ae2 -p ipencap -m comment --comment "cali:Il5cEDn7POq5E7rk" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:9zK_Jrii60k-rKjX" -j cali-pro-kns.kube-system
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:hnJMKh9MkGSFesMR" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:THcDORIKY6yGRJ6G" -j cali-pro-_PTRGc0U-L5Kz7V6ERW
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:tDQo0kJ41Y0e8caF" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-fw-caliec620394ae2 -m comment --comment "cali:ikKzErH9dIIO-SiN" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-pri-kns.kube-system -m comment --comment "cali:zoH5gU6U55FKZxEo" -j MARK --set-xmark 0x10000/0x10000
< -A cali-pri-kns.kube-system -m comment --comment "cali:bcGRIJcyOS9dgBiB" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-pro-kns.kube-system -m comment --comment "cali:-50oJuMfLVO3LkBk" -j MARK --set-xmark 0x10000/0x10000
< -A cali-pro-kns.kube-system -m comment --comment "cali:ztVPKv1UYejNzm1g" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-to-wl-dispatch -o cali0eacf697eec -m comment --comment "cali:VPrTvboHvIFHBSt1" -g cali-tw-cali0eacf697eec
< -A cali-to-wl-dispatch -o cali9b778b31de0 -m comment --comment "cali:WHPbZiPfWHDgLKjB" -g cali-tw-cali9b778b31de0
< -A cali-to-wl-dispatch -o caliec620394ae2 -m comment --comment "cali:aGYX_fDzHdiEIwAG" -g cali-tw-caliec620394ae2
< -A cali-to-wl-dispatch -m comment --comment "cali:-SrWckbj6EyTFkUR" -m comment --comment "Unknown interface" -j DROP
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:7FFwvJFKdOzlIM__" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:amlY9V8ff-q8RYyQ" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:drGUdAs1y8Q3y5Ec" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:fsHIb1BNTAQBxGRg" -j cali-pri-kns.kube-system
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:dSm3fgX2L51Qb27W" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:VsxKf7fD__Ggcu4S" -j cali-pri-_u2Tn2rSoAPffvE7JO6
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:64mCahwSw9FfCjaG" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali0eacf697eec -m comment --comment "cali:MinQYCCQ9CUT5LmE" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:g1NLmquMZ1d3pkgJ" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:TkR9NEf_suBo3eij" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:CLxUE1I4XiSWbEZZ" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:afIWE8H6xc681NB5" -j cali-pri-kns.kube-system
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:8vkx1yUtqWSm3DOd" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:g4_eqvzr95kRNpJn" -j cali-pri-_u2Tn2rSoAPffvE7JO6
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:WIAKY9apNp3bkCL1" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-cali9b778b31de0 -m comment --comment "cali:DuLVD27vb4qTIepy" -m comment --comment "Drop if no profiles matched" -j DROP
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:nnBX6C-bhiIU4J8D" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:xtGCf2UVq3LdACdA" -m conntrack --ctstate INVALID -j DROP
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:DiYMTT6TG0NG7NgN" -j MARK --set-xmark 0x0/0x10000
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:oXYaSq02GB-_1_gP" -j cali-pri-kns.kube-system
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:5cDLKNsjx-YQds-p" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:G2bOfKMYjtePpJ-F" -j cali-pri-_PTRGc0U-L5Kz7V6ERW
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:pFt3_bmCuapd3U2C" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
< -A cali-tw-caliec620394ae2 -m comment --comment "cali:FY4yzFeVB_x3EhUj" -m comment --comment "Drop if no profiles matched" -j DROP
---
> -A cali-from-wl-dispatch -i calib3c61c3cba9 -m comment --comment "cali:-iFqLLPLKdTdHT3y" -g cali-fw-calib3c61c3cba9
> -A cali-from-wl-dispatch -i calie5da25cfd1a -m comment --comment "cali:GxF3GskLi8lTypH5" -g cali-fw-calie5da25cfd1a
> -A cali-from-wl-dispatch -m comment --comment "cali:YyQsK4pOzpK5SJvx" -m comment --comment "Unknown interface" -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:bFkWuXTY4cHRsWCw" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:Yz28x-xlZn3iK1dm" -m conntrack --ctstate INVALID -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:hNHUX-t0brxvobsq" -j MARK --set-xmark 0x0/0x10000
> -A cali-fw-calib3c61c3cba9 -p udp -m comment --comment "cali:zcm2POu6Mo0LAV1k" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
> -A cali-fw-calib3c61c3cba9 -p ipencap -m comment --comment "cali:DwmLW-ADFbvCVMfq" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:AdEJ83cx93vOFgSp" -j cali-pro-kns.default
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:cjz_9Zh_TDJTWzWk" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:0oinIgeZuK--S8aG" -j cali-pro-ksa.default.default
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:0_TxaYf8xQRbbTfr" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calib3c61c3cba9 -m comment --comment "cali:PcQjLe_fD4RDZOPL" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:DQTj2Ly76ps1vAfV" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:CXvRiR4x2LDFKt8A" -m conntrack --ctstate INVALID -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:WE5TOgqrYenB8dk5" -j MARK --set-xmark 0x0/0x10000
> -A cali-fw-calie5da25cfd1a -p udp -m comment --comment "cali:Fyu95QLIPxazayxx" -m comment --comment "Drop VXLAN encapped packets originating in workloads" -m multiport --dports 4789 -j DROP
> -A cali-fw-calie5da25cfd1a -p ipencap -m comment --comment "cali:IlO35GnXY-zk9pDG" -m comment --comment "Drop IPinIP encapped packets originating in workloads" -j DROP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:8aFJF7iWEDcpQkmW" -j cali-pro-kns.kube-system
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:OFSNaS_SPjCLBu_h" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:RQN0NspAfySpAEOJ" -j cali-pro-_hNSGmJYNT8uLIzxesP
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:TneHhNn7F7DX_fh9" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-fw-calie5da25cfd1a -m comment --comment "cali:alxKi3OQLkF6h1nM" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-pri-_hNSGmJYNT8uLIzxesP -m comment --comment "cali:k9ZghIA0HRR2xDY1" -m comment --comment "Profile ksa.kube-system.default ingress"
> -A cali-pri-kns.default -m comment --comment "cali:WMSw8BmYOknRHfsz" -m comment --comment "Profile kns.default ingress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pri-kns.default -m comment --comment "cali:z015TBt2tO4F28NC" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pri-kns.kube-system -m comment --comment "cali:J1TyxtHWd0qaBGK-" -m comment --comment "Profile kns.kube-system ingress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pri-kns.kube-system -m comment --comment "cali:QIB6k7eEKdIg73Jp" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pri-ksa.default.default -m comment --comment "cali:PrckJA84jX_kGp99" -m comment --comment "Profile ksa.default.default ingress"
> -A cali-pro-_hNSGmJYNT8uLIzxesP -m comment --comment "cali:WHw0aH5lHwGz91dL" -m comment --comment "Profile ksa.kube-system.default egress"
> -A cali-pro-kns.default -m comment --comment "cali:Vr81boRqq4V77Sg8" -m comment --comment "Profile kns.default egress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pro-kns.default -m comment --comment "cali:2CkTlvGj1F9ZRYXl" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pro-kns.kube-system -m comment --comment "cali:tgOR2S8DVHZW3F1M" -m comment --comment "Profile kns.kube-system egress" -j MARK --set-xmark 0x10000/0x10000
> -A cali-pro-kns.kube-system -m comment --comment "cali:HVEEtYPJsiGRXCIt" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-pro-ksa.default.default -m comment --comment "cali:bUZzZcietq9v5Ybq" -m comment --comment "Profile ksa.default.default egress"
> -A cali-to-wl-dispatch -o calib3c61c3cba9 -m comment --comment "cali:6rW7D4-zGjk_2cCD" -g cali-tw-calib3c61c3cba9
> -A cali-to-wl-dispatch -o calie5da25cfd1a -m comment --comment "cali:r60LIQJKeOfFbGL5" -g cali-tw-calie5da25cfd1a
> -A cali-to-wl-dispatch -m comment --comment "cali:o-l5VddE4yTV5GL2" -m comment --comment "Unknown interface" -j DROP
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:ERriy_LNlpHE7Zpa" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:319XBxiHdVGHvs2I" -m conntrack --ctstate INVALID -j DROP
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:nChoIqPtK8-J0Tnh" -j MARK --set-xmark 0x0/0x10000
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:NSSJcrC8rQgyfE3o" -j cali-pri-kns.default
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:RpCToeb7ZQGnRoqQ" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:sj37TD8cnzxYCzNk" -j cali-pri-ksa.default.default
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:DJaOVT9ZF-RYc9gH" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calib3c61c3cba9 -m comment --comment "cali:zyPfP22BuZI-2_kX" -m comment --comment "Drop if no profiles matched" -j DROP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:cUpiZgaRMdNMAzOu" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:Nz6EJpcT4V1-Dg4G" -m conntrack --ctstate INVALID -j DROP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:SKDInyWWm4fssZEv" -j MARK --set-xmark 0x0/0x10000
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:qM3GnqOh8tZhpwvH" -j cali-pri-kns.kube-system
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:OvJg8heW9IZj0DOC" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:49ySDN8EtQ1raSkw" -j cali-pri-_hNSGmJYNT8uLIzxesP
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:lZeBMfryljPUjaof" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
> -A cali-tw-calie5da25cfd1a -m comment --comment "cali:ZD5L3VwHHdmAtnGC" -m comment --comment "Drop if no profiles matched" -j DROP
208,209c195,196
< # Completed on Wed Mar 16 07:49:08 2022
< # Generated by iptables-save v1.8.7 on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022
> # Generated by iptables-save v1.8.7 on Wed Mar 16 07:55:33 2022
269a257,258
> -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
> -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
276,277d264
< -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
< -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
291c278
< # Completed on Wed Mar 16 07:49:08 2022
---
> # Completed on Wed Mar 16 07:55:33 2022

rastislavs avatar Mar 16 '22 08:03 rastislavs

but contains rules for non-existing interface names:

This seems to suggest that calico/node is working with out-of-date information. Are you calico/node pods "Ready"?

Symptoms seem to indicate that Calico is missing updates from the data store (both to notify it of new pods, but also to notify it of old pods that no longer exist)

caseydavenport avatar Mar 21 '22 23:03 caseydavenport

This seems to suggest that calico/node is working with out-of-date information. Are you calico/node pods "Ready"?

Yes, all calico-node pods were running fine - not reporting any issues, not even in logs.

It may be important to note that we managed to reproduce the issue only on Flatcar Container Linux. On other distributions the same scenario works fine.

rastislavs avatar Mar 22 '22 16:03 rastislavs

v3.22.0 had a nasty bug where calico could end up with an outdated view of the world. Fixed by https://github.com/projectcalico/calico/pull/5665 in v3.22.1 - might be worth upgrading?

lwr20 avatar Mar 22 '22 16:03 lwr20

@lwr20 symtoms sound similar, but that bug was actually fixed in v3.21.3 and isn't present in v3.22

caseydavenport avatar Mar 22 '22 16:03 caseydavenport

Thank you for the correction.

lwr20 avatar Mar 22 '22 17:03 lwr20

Right, I can also confirm that the issue is present on v3.22.1 as well.

rastislavs avatar Mar 23 '22 08:03 rastislavs

If it helps I have a similar issue on AlmaLinux8.

DPlane Backend is also nftables and the veth Interfaces in calicos engine store are not anymore existing on the host.

If I remove calico as my NetworkPolicy provider the cluster recovers.

Cellebyte avatar Mar 23 '22 23:03 Cellebyte

Chain cali-from-wl-dispatch (2 references)
 pkts bytes target     prot opt in     out     source               destination
 1634  135K cali-fw-cali0eacf697eec  all  --  cali0eacf697eec *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:jfxMCpNf8Nj1KM-K */
 1666  138K cali-fw-cali9b778b31de0  all  --  cali9b778b31de0 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:Z5r7QQDW3XChII0Z */
 1529  111K cali-fw-caliec620394ae2  all  --  caliec620394ae2 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:j5kopEPZZL4XyLX1 */
 8054  676K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */

Just wanted to confirm that these rules do appear to be out of date, and don't match the pods running on the node.

Are you using calico/typha in this cluster?

I can see an event indicating that Felix receives a delete event for the pod:

2022-03-15 17:20:57.208 [INFO][47] felix/endpoint_mgr.go 668: Workload removed, deleting its chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/tmp-shell", EndpointId:"eth0"}
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 537: Queuing deletion of chain. chainName="cali-tw-calie5da25cfd1a" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 537: Queuing deletion of chain. chainName="cali-fw-calie5da25cfd1a" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/endpoint_mgr.go 545: Workload removed, deleting old state. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/tmp-shell", EndpointId:"eth0"}
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 506: Queueing update of chain. chainName="cali-from-wl-dispatch" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 591: Chain no longer referenced, marking it for removal chainName="cali-fw-calie5da25cfd1a"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 506: Queueing update of chain. chainName="cali-to-wl-dispatch" ipVersion=0x4 table="filter"
2022-03-15 17:20:57.208 [INFO][47] felix/table.go 591: Chain no longer referenced, marking it for removal chainName="cali-tw-calie5da25cfd1a"

caseydavenport avatar Mar 31 '22 22:03 caseydavenport

Chain cali-from-wl-dispatch (2 references)
 pkts bytes target     prot opt in     out     source               destination
 1634  135K cali-fw-cali0eacf697eec  all  --  cali0eacf697eec *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:jfxMCpNf8Nj1KM-K */
 1666  138K cali-fw-cali9b778b31de0  all  --  cali9b778b31de0 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:Z5r7QQDW3XChII0Z */
 1529  111K cali-fw-caliec620394ae2  all  --  caliec620394ae2 *       0.0.0.0/0            0.0.0.0/0           [goto]  /* cali:j5kopEPZZL4XyLX1 */
 8054  676K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:jP5qeZOJds6jo3_6 */ /* Unknown interface */

Just wanted to confirm that these rules do appear to be out of date, and don't match the pods running on the node.

That is correct, these rules appear to be out of date.

Are you using calico/typha in this cluster?

Typha is disabled in the cluster.

rastislavs avatar Apr 01 '22 09:04 rastislavs

Rocky Linux release 8.5 (Green Obsidian) also seems to be affected

rastislavs avatar May 11 '22 07:05 rastislavs

@rastislavs we have encountered the same problem (also using flatcar linux) and found that during the upgrade process something is writing rules to the iptables-legacy tables rather than the iptables-nft one (we got suspicious because of the warning for the presence of iptables-legacy tables and decided to take a closer look).

As adding rules via both the legacy and the nft iptables-interface might result in changes in the ordering of rules, we disabled the auto-discovery of the iptables-mode and set it statically to nft (by setting the environment variable FELIX_IPTABLESBACKEND to NFT). After disabling iptables-autodiscovery this way all subsequent upgrades did not break the CNI for us any longer.

Can you try that and give feedback?

kllex avatar May 11 '22 08:05 kllex

@kllex thanks for the insight; Calico autodetection works by loading the iptables rules from both sources (iptables/nft) and comparing the number of rules that it sees. If it sees more rules in iptables-nft then it'll choose that, otherwise it'll use iptables-legacy. Would be good to know why that heuristic is failing; do your machines start up with no rules in either table, for example or are we hitting a different error? Perhaps RH8.5 has moved off iptables completely and is using nft natively (so we don't see any iptables-nft rules)?

fasaxc avatar May 11 '22 08:05 fasaxc

@fasaxc we have not counted on startup but rather compared pre-calico-update vs post-calico-update vs post-reboot. At pre-calico-update there are a dozen rules in the iptables-legacy tables inserted for our node-local-dns setup and a few hundred in the iptables-nft tables with the usual kubernetes/calico rules.

After the calico-update there were about 80 additional rules in the legacy-tables which we usually see in the nft-tables. After rebooting the number of rules in the respective tables returned to the way it was before the calico-update.

Edit: I have started up a VM with the same image we are using for our control-planes/workers as well and it shows zero rules for either table.

kllex avatar May 11 '22 08:05 kllex

Setting FELIX_IPTABLESBACKEND to NFT resolved the issue for us as well. What are the potential risks of forcing NFT backend?

rastislavs avatar May 11 '22 13:05 rastislavs

I have started up a VM with the same image we are using for our control-planes/workers as well and it shows zero rules for either table.

I think, in that case, we'd default to legacy mode. Must be a change in recent OS versions to ship with literally zero rules. Breaks our heuristic.

fasaxc avatar May 11 '22 13:05 fasaxc

Hey, what is the status of this issue? Recently we managed to reproduce this issue also on Ubuntu 22.04

rastislavs avatar Oct 03 '22 14:10 rastislavs

It appears that this is an issue where Calico incorrectly auto-detects the wrong iptables backend to use, which would suggest the simple solution of specifying explicitly which iptables version is in use on your hosts.

I'm not sure if there is an improvement to Calico's auto-detection that could also resolve this, but would be happy to hear any suggestions.

caseydavenport avatar Oct 10 '22 21:10 caseydavenport

@caseydavenport I think the first thing we should try is to update auto-detection. As far as I see, the auto-detection code is located here: https://github.com/projectcalico/calico/blob/8f1ae212ef38e40c2b79c1cb743cd151a82bb45e/felix/environment/feature_detect.go#L286-L326

This code is based on the upstream code, however, the upstream code got updated since then. It is no longer checking are there 10 rules, but instead, it's checking whether legacy or NFT has more rules. I think we should try the same in Calico.

The upstream auto-detection logic is now located here: https://github.com/kubernetes/release/blob/cac877222c829854ed1ec343ec45e79ef1660f8f/images/build/debian-iptables/buster/iptables-wrapper#L19-L40

xmudrii avatar Oct 11 '22 11:10 xmudrii