HostPort of pod/container can't access from other nodes in eBPF mode
After enabling eBPF mode of calico, then I created a pod which has a hostPort setting, what I expected is the clients from every node and outside of cluster can access hostIP:hostPort, but the result is: it can be accessed only from the host where the pod is located, but can't be accessed from other nodes or outside of cluster.
Expected Behavior
the clients from every node and outside of cluster can access hostIP:hostPort
Current Behavior
it can be accessed only from the host where the pod is located. it can't be accessed from other nodes or outside of cluster. when I use "curl hostIP:hostPort", it just hang.
Possible Solution
Steps to Reproduce (for bugs)
- stop kube-proxy on every nodes
- restart linux
- enable ebpf dataplane:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}' - recreate all calico pods and test pods.
Context
"curl PodIP:containerPort" is always successful from other nodes.
Your Environment
- Calico version: v3.23.3
- Orchestrator version: kubernetes 1.23.9
- Operating System and version: CentOS 8.2
- Link to your project (optional):
@tomastigera
See today's discussion in the "networking" channel of the "Calico Users" Slack team for a similar report.
In my case, I'm also using Calico version 3.23.3, Kubernetes 1.24.4, Flatcar Container Linux 3277.1.1, provisioned by kOps, running in AWS EC2.
Using tcpdump, I can see the following packet flow, with the client designated as C, the target container in the pod as T, and the EC2 instance hosting the container H. The container's requested host port is hp; its corresponding container port is cp.
- C:* → H:hp on eth0 [SYN]
This is the incoming packet from the client, trying to open the connection. - C:* → T:cp on cali* [SYN]
This packet is forwarded via DNAT. - T:cp → C:* on cali* [SYN-ACK]
This is the container responding to the connection request.
There are no more packets exchanged in that conversation. After between one half and a full second later, the client sends the opening packet again, and the same three-packet pattern repeats.
On the client's machine (C above), using tcpdump to inspect the traffic, I see the client's outbound packets trying to open the connection, but no packets make it back from either the EC2 instance H or the pod T.
The EC2 instance's ENI's source/destination checks are disabled, and the ENI's security group rules allow all egress using both TCP and UDP.
On the server machine H, the following iptables rules mention the host port (29888 in my example, with 100.110.179.72 as the target pod's IP address):
-A CNI-DN-7afd689b74012ed42abdd -s 100.110.179.72/32 -p tcp -m tcp --dport 29888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7afd689b74012ed42abdd -s 127.0.0.1/32 -p tcp -m tcp --dport 29888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7afd689b74012ed42abdd -p tcp -m tcp --dport 29888 -j DNAT --to-destination 100.110.179.72:29888
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"f97edae62bbd47059576974a4f8fd0204e17193acd9ca57bf9e369b5400c53eb\"" -m multiport --dports 29888 -j CNI-DN-7afd689b74012ed42abdd
Using iptables-save -c, I can see the counters for these rules incrementing steadily, showing that the rules are hit by this packet flow.
What could be blocking these return packets from leaving the target container's EC2 instance?
Per Tomas's advice, setting the "FELIX_BPFHostConntrackBypass" environment variable on the "calico-node" containers to "false" alleviates the problem.
@gongzh the solution is to set FELIX_BPFHostConntrackBypass=false You have to do it as an env variable for now as it is missing in the felix configuration resource. Will be fixed.
@tomastigera @seh many thanks and looking forward to new config BPFDisableLinuxConntrack.
@seh Can you please share the complete iptables rules that you got from iptables-save -c?
Here's the output from iptables-save -c run on a machine hosting at least one pod using a host port. In this case, look for TCP port 3001, used by a pod with IP address 100.103.226.208.
"iptables-save -c" output
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:cali-OUTPUT - [0:0]
:cali-PREROUTING - [0:0]
:cali-rpf - [0:0]
:cali-to-host-endpoint - [0:0]
:cali-untracked-flows - [0:0]
:cali-untracked-policy - [0:0]
[11857277:21309286804] -A PREROUTING -m comment --comment "cali:6gwbT8clXdHdC1b1" -j cali-PREROUTING
[3386567:11879945822] -A OUTPUT -m comment --comment "cali:tVnHkvAo15HuiPy0" -j cali-OUTPUT
[3386567:11879945822] -A cali-OUTPUT -m comment --comment "cali:njdnLwYeGqBJyMxW" -j MARK --set-xmark 0x0/0xf0000
[3386567:11879945822] -A cali-OUTPUT -m comment --comment "cali:rz86uTUcEZAfFsh7" -j cali-to-host-endpoint
[0:0] -A cali-OUTPUT -m comment --comment "cali:bLB0nfIOyylxkMRl" -m mark --mark 0x10000/0x10000 -j MARK --set-xmark 0x3000000/0xffffffff
[387:58925] -A cali-OUTPUT -m comment --comment "cali:POF4qT4U5ax-xBuY" -m mark --mark 0x3000000 -j ACCEPT
[11857277:21309286804] -A cali-PREROUTING -m comment --comment "cali:6mAAmzbAxBnZjA7b" -j cali-rpf
[5308331:16323287670] -A cali-PREROUTING -m comment --comment "cali:Xd1cllrbNxZeQidT" -m addrtype ! --dst-type LOCAL -g cali-untracked-flows
[0:0] -A cali-PREROUTING -m comment --comment "cali:13gv11A-lA-bpCYg" -m comment --comment "Jump to target for packets with Bypass mark" -m mark --mark 0x3000000 -g cali-untracked-policy
[83:31785] -A cali-rpf -m comment --comment "cali:6xGtAA2dcFGJ69f4" -m comment --comment "Skip RPF if requested" -m mark --mark 0x3400000/0x3f00000 -j RETURN
[1144020:497651066] -A cali-rpf -m comment --comment "cali:Rtjbbkp32IBYWoqM" -m comment --comment "Skip RPF on packets returning from tunnel to the client" -m mark --mark 0x3300000/0x3f00000 -j RETURN
[0:0] -A cali-rpf -m comment --comment "cali:K2Ow3gMMq7e4Oklj" -m mark --mark 0x3300000/0x1ff00000 -m rpfilter --validmark --accept-local -j RETURN
[0:0] -A cali-rpf -m comment --comment "cali:VsYL4esKcy1qDtDI" -m mark --mark 0x1000000/0x1000000 -m rpfilter --validmark --invert -j DROP
[0:0] -A cali-untracked-policy -m comment --comment "cali:GmTHf-cZvJXoOqUt" -j MARK --set-xmark 0x0/0x0
[0:0] -A cali-untracked-policy -m comment --comment "cali:QJsxuYT1xa6WFzwv" -j NOTRACK
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:cali-to-wl-dispatch - [0:0]
:cali-to-wl-dispatch-5 - [0:0]
[1172097:1019988079] -A INPUT -m comment --comment "cali:zkuE8qdwsVpH6Kd2" -m comment --comment "Accept packets from flows that pre-date BPF." -m mark --mark 0x5000000/0x5000000 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[131:40990] -A INPUT -m comment --comment "cali:XQL0mC-L6wldZdgN" -m comment --comment "Drop packets from unknown flows." -m mark --mark 0x5000000/0x5000000 -j DROP
[288840:22965070] -A INPUT -i cali+ -m comment --comment "cali:pbFdTFCLcV-MVLSS" -m mark --mark 0x1000000/0x1000000 -j ACCEPT
[0:0] -A INPUT -i cali+ -m comment --comment "cali:u_TyW7ph8QsYnThE" -m mark ! --mark 0x1000000/0x1000000 -j DROP
[2365445:3596958553] -A INPUT -j KUBE-FIREWALL
[4869408:5466201263] -A FORWARD -m comment --comment "cali:umcmOn0WnTNOKJrp" -m comment --comment "Pre-approved by BPF programs." -m mark --mark 0x3000000/0x3000000 -j ACCEPT
[0:0] -A FORWARD -i cali+ -m comment --comment "cali:NnQ109Z-tVFkJGc1" -m comment --comment "From workload without BPF seen mark" -m mark ! --mark 0x1000000/0x1000000 -j DROP
[2759095:351663751] -A FORWARD -m comment --comment "cali:YmI_zfAgHIHbINEV" -m comment --comment "Mark pre-established flows." -m conntrack --ctstate RELATED,ESTABLISHED -j MARK --set-xmark 0x8000000/0x8000000
[2831498:392219529] -A FORWARD -o cali+ -m comment --comment "cali:-EFgmtwMJVO64q9s" -m comment --comment "To workload, check workload is known." -j cali-to-wl-dispatch
[331833:10815138912] -A FORWARD -i cali+ -m comment --comment "cali:wP1i1sEU71uRzM5d" -m comment --comment "To workload, mark has already been verified." -j ACCEPT
[2853447:1029960252] -A OUTPUT -m comment --comment "cali:jV0u_0SICA6NQt9x" -m comment --comment "Mark pre-established host flows." -m conntrack --ctstate RELATED,ESTABLISHED -j MARK --set-xmark 0x8000000/0x8000000
[3388442:11880278673] -A OUTPUT -j KUBE-FIREWALL
[0:0] -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
[0:0] -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
[1708669:146634066] -A cali-to-wl-dispatch -o cali5+ -m comment --comment "cali:s-h3O_VJJvyg0nUF" -g cali-to-wl-dispatch-5
[0:0] -A cali-to-wl-dispatch -o calid9ab8c033f9 -m comment --comment "cali:yP0JWT3bsoz6XJhH" -j ACCEPT
[1122829:245585463] -A cali-to-wl-dispatch -o calie2295745132 -m comment --comment "cali:voUbXKKnPFRrVLxK" -j ACCEPT
[0:0] -A cali-to-wl-dispatch -m comment --comment "cali:YRYUSAaB3tib9T3-" -m comment --comment "Unknown interface" -j DROP
[1688220:135901471] -A cali-to-wl-dispatch-5 -o cali51fba186272 -m comment --comment "cali:pUR7T3sVzy_bnssZ" -j ACCEPT
[20449:10732595] -A cali-to-wl-dispatch-5 -o cali5f35fcc4c93 -m comment --comment "cali:8L1wwuUAT1IJDURU" -j ACCEPT
[0:0] -A cali-to-wl-dispatch-5 -m comment --comment "cali:u_U11TQABMOZ13O3" -m comment --comment "Unknown interface" -j DROP
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:CNI-DN-d1d9e23f797943d48e271 - [0:0]
:CNI-HOSTPORT-DNAT - [0:0]
:CNI-HOSTPORT-MASQ - [0:0]
:CNI-HOSTPORT-SETMARK - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-POSTROUTING - [0:0]
:cali-POSTROUTING - [0:0]
:cali-fip-snat - [0:0]
:cali-nat-outgoing - [0:0]
[106849:47239767] -A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
[69252:4155120] -A OUTPUT -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
[619241:79620343] -A POSTROUTING -m comment --comment "cali:O3lYWMrLQYEMJtB5" -j cali-POSTROUTING
[451612:69562700] -A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
[451772:69577038] -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -s 100.103.226.208/32 -p tcp -m tcp --dport 3001 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -s 127.0.0.1/32 -p tcp -m tcp --dport 3001 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -p tcp -m tcp --dport 3001 -j DNAT --to-destination 100.103.226.208:3001
[0:0] -A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"4084d0987639edbf35515fba0c0090bc30259e9f202b74a7e308f6c174c7f225\"" -m multiport --dports 3001 -j CNI-DN-d1d9e23f797943d48e271
[0:0] -A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
[0:0] -A CNI-HOSTPORT-SETMARK -m comment --comment "CNI portfwd masquerade mark" -j MARK --set-xmark 0x2000/0x2000
[0:0] -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
[0:0] -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
[451772:69577038] -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
[0:0] -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
[0:0] -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
[0:0] -A cali-POSTROUTING -m comment --comment "cali:ggLCSZ9_r9uXnHIo" -m comment --comment "BPF loopback SNAT" -m mark --mark 0x3600000/0x3f00000 -j MASQUERADE --random-fully
[619277:79622926] -A cali-POSTROUTING -m comment --comment "cali:THrmnLhAi3cxMLBO" -j cali-fip-snat
[619277:79622926] -A cali-POSTROUTING -m comment --comment "cali:ez7mOi3T8N6rtIlJ" -j cali-nat-outgoing
[0:0] -A cali-POSTROUTING -o vxlan.calico -m comment --comment "cali:Aah0tlBqoqE8PrS0" -m addrtype ! --src-type LOCAL --limit-iface-out -m addrtype --src-type LOCAL -j MASQUERADE --random-fully
[167645:10058700] -A cali-nat-outgoing -m comment --comment "cali:k9SjE1bWVYbIZgCm" -m mark --mark 0x3800000/0x3f00000 -j MASQUERADE --random-fully
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
This machine is running Kubernetes 1.24.4 and Calico 3.23.3 atop Flatcar Container Linux 3277.1.2, reporting Linux kernel version 5.15.63.
@gongzh @seh heads up - this workaround would not work for 3.24+ . As we understand the issue now, there is in fact little guarantee that a packet will go through iptables. In fact, we are doing all we can for packets to not go through iptables if possible. However, things like hostPort etc. rely on iptable rules injected by non-calico entity. In the hostPort case, the CNI plugin. We do not support that. However, we are currently working on how to identify traffic that originally goes to host and then ends up at a pod/container so that we can support that usecase.
If I understand the last two sentences correctly, there's a gap between how the CNI plugin works to support host port traffic and how Calico handles traffic (relying on iptables versus trying to avoid iptables being involved), but you're intending to close that gap. If that's what you meant, do you expect that the fix or accommodation will make it into a 3.24-era release?
@seh right, we are looking into how to close the gap and we will most likely backport that too.
The issue is fixed and should work as expected in v3.25
@mazdakn could you backport it to 3.24 too? :pray:
Sure @tomastigera . Working on it, both for 3.23 and 3.24.
@tomastigera, if we use Calico version 3.24.2 with #6813 included, do we still need to set the "FELIX_BPFHostConntrackBypass" environment variable or use the new FelixConfiguration field (#6641), or are those workarounds obsolete now?
@seh Calico 3.24.2 should have the fix, so you do not need to set that configuration. In case, you would like to change it, you can do using either of methods.
To make sure I understand, is it correct that we don't need to set the "FELIX_BPFHostConntrackBypass" environment variable to "true," but if we still had it set like that because we hadn't removed it yet, things would fine, and we wouldn't be subject to this defect either way?
As it stand now, FELIX_BPFHostConntrackBypass setting does not affect the hostPort behaviour, i.e. setting it to true or false won't matter to the hostPort traffic. In both cases, hostPort should work fine.
@hakman, with this, we can revert kubernetes/kops#14205 once we adopt Calico version 3.24.2 or later.
See kubernetes/kops#14685 for removing this workaround in kOps.