calico icon indicating copy to clipboard operation
calico copied to clipboard

HostPort of pod/container can't access from other nodes in eBPF mode

Open gongzh opened this issue 3 years ago • 1 comments

After enabling eBPF mode of calico, then I created a pod which has a hostPort setting, what I expected is the clients from every node and outside of cluster can access hostIP:hostPort, but the result is: it can be accessed only from the host where the pod is located, but can't be accessed from other nodes or outside of cluster.

Expected Behavior

the clients from every node and outside of cluster can access hostIP:hostPort

Current Behavior

it can be accessed only from the host where the pod is located. it can't be accessed from other nodes or outside of cluster. when I use "curl hostIP:hostPort", it just hang.

Possible Solution

Steps to Reproduce (for bugs)

  1. stop kube-proxy on every nodes
  2. restart linux
  3. enable ebpf dataplane: calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'
  4. recreate all calico pods and test pods.

Context

"curl PodIP:containerPort" is always successful from other nodes.

Your Environment

  • Calico version: v3.23.3
  • Orchestrator version: kubernetes 1.23.9
  • Operating System and version: CentOS 8.2
  • Link to your project (optional):

gongzh avatar Aug 10 '22 03:08 gongzh

@tomastigera

caseydavenport avatar Aug 16 '22 14:08 caseydavenport

See today's discussion in the "networking" channel of the "Calico Users" Slack team for a similar report.

In my case, I'm also using Calico version 3.23.3, Kubernetes 1.24.4, Flatcar Container Linux 3277.1.1, provisioned by kOps, running in AWS EC2.

Using tcpdump, I can see the following packet flow, with the client designated as C, the target container in the pod as T, and the EC2 instance hosting the container H. The container's requested host port is hp; its corresponding container port is cp.

  • C:* → H:hp on eth0 [SYN]
    This is the incoming packet from the client, trying to open the connection.
  • C:* → T:cp on cali* [SYN]
    This packet is forwarded via DNAT.
  • T:cpC:* on cali* [SYN-ACK]
    This is the container responding to the connection request.

There are no more packets exchanged in that conversation. After between one half and a full second later, the client sends the opening packet again, and the same three-packet pattern repeats.

On the client's machine (C above), using tcpdump to inspect the traffic, I see the client's outbound packets trying to open the connection, but no packets make it back from either the EC2 instance H or the pod T.

The EC2 instance's ENI's source/destination checks are disabled, and the ENI's security group rules allow all egress using both TCP and UDP.

On the server machine H, the following iptables rules mention the host port (29888 in my example, with 100.110.179.72 as the target pod's IP address):

-A CNI-DN-7afd689b74012ed42abdd -s 100.110.179.72/32 -p tcp -m tcp --dport 29888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7afd689b74012ed42abdd -s 127.0.0.1/32 -p tcp -m tcp --dport 29888 -j CNI-HOSTPORT-SETMARK
-A CNI-DN-7afd689b74012ed42abdd -p tcp -m tcp --dport 29888 -j DNAT --to-destination 100.110.179.72:29888
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"f97edae62bbd47059576974a4f8fd0204e17193acd9ca57bf9e369b5400c53eb\"" -m multiport --dports 29888 -j CNI-DN-7afd689b74012ed42abdd

Using iptables-save -c, I can see the counters for these rules incrementing steadily, showing that the rules are hit by this packet flow.

What could be blocking these return packets from leaving the target container's EC2 instance?

seh avatar Aug 30 '22 17:08 seh

Per Tomas's advice, setting the "FELIX_BPFHostConntrackBypass" environment variable on the "calico-node" containers to "false" alleviates the problem.

seh avatar Aug 30 '22 18:08 seh

@gongzh the solution is to set FELIX_BPFHostConntrackBypass=false You have to do it as an env variable for now as it is missing in the felix configuration resource. Will be fixed.

tomastigera avatar Aug 30 '22 18:08 tomastigera

@tomastigera @seh many thanks and looking forward to new config BPFDisableLinuxConntrack.

gongzh avatar Sep 05 '22 12:09 gongzh

@seh Can you please share the complete iptables rules that you got from iptables-save -c?

mazdakn avatar Sep 18 '22 22:09 mazdakn

Here's the output from iptables-save -c run on a machine hosting at least one pod using a host port. In this case, look for TCP port 3001, used by a pod with IP address 100.103.226.208.

"iptables-save -c" output
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:cali-OUTPUT - [0:0]
:cali-PREROUTING - [0:0]
:cali-rpf - [0:0]
:cali-to-host-endpoint - [0:0]
:cali-untracked-flows - [0:0]
:cali-untracked-policy - [0:0]
[11857277:21309286804] -A PREROUTING -m comment --comment "cali:6gwbT8clXdHdC1b1" -j cali-PREROUTING
[3386567:11879945822] -A OUTPUT -m comment --comment "cali:tVnHkvAo15HuiPy0" -j cali-OUTPUT
[3386567:11879945822] -A cali-OUTPUT -m comment --comment "cali:njdnLwYeGqBJyMxW" -j MARK --set-xmark 0x0/0xf0000
[3386567:11879945822] -A cali-OUTPUT -m comment --comment "cali:rz86uTUcEZAfFsh7" -j cali-to-host-endpoint
[0:0] -A cali-OUTPUT -m comment --comment "cali:bLB0nfIOyylxkMRl" -m mark --mark 0x10000/0x10000 -j MARK --set-xmark 0x3000000/0xffffffff
[387:58925] -A cali-OUTPUT -m comment --comment "cali:POF4qT4U5ax-xBuY" -m mark --mark 0x3000000 -j ACCEPT
[11857277:21309286804] -A cali-PREROUTING -m comment --comment "cali:6mAAmzbAxBnZjA7b" -j cali-rpf
[5308331:16323287670] -A cali-PREROUTING -m comment --comment "cali:Xd1cllrbNxZeQidT" -m addrtype ! --dst-type LOCAL -g cali-untracked-flows
[0:0] -A cali-PREROUTING -m comment --comment "cali:13gv11A-lA-bpCYg" -m comment --comment "Jump to target for packets with Bypass mark" -m mark --mark 0x3000000 -g cali-untracked-policy
[83:31785] -A cali-rpf -m comment --comment "cali:6xGtAA2dcFGJ69f4" -m comment --comment "Skip RPF if requested" -m mark --mark 0x3400000/0x3f00000 -j RETURN
[1144020:497651066] -A cali-rpf -m comment --comment "cali:Rtjbbkp32IBYWoqM" -m comment --comment "Skip RPF on packets returning from tunnel to the client" -m mark --mark 0x3300000/0x3f00000 -j RETURN
[0:0] -A cali-rpf -m comment --comment "cali:K2Ow3gMMq7e4Oklj" -m mark --mark 0x3300000/0x1ff00000 -m rpfilter --validmark --accept-local -j RETURN
[0:0] -A cali-rpf -m comment --comment "cali:VsYL4esKcy1qDtDI" -m mark --mark 0x1000000/0x1000000 -m rpfilter --validmark --invert -j DROP
[0:0] -A cali-untracked-policy -m comment --comment "cali:GmTHf-cZvJXoOqUt" -j MARK --set-xmark 0x0/0x0
[0:0] -A cali-untracked-policy -m comment --comment "cali:QJsxuYT1xa6WFzwv" -j NOTRACK
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:cali-to-wl-dispatch - [0:0]
:cali-to-wl-dispatch-5 - [0:0]
[1172097:1019988079] -A INPUT -m comment --comment "cali:zkuE8qdwsVpH6Kd2" -m comment --comment "Accept packets from flows that pre-date BPF." -m mark --mark 0x5000000/0x5000000 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[131:40990] -A INPUT -m comment --comment "cali:XQL0mC-L6wldZdgN" -m comment --comment "Drop packets from unknown flows." -m mark --mark 0x5000000/0x5000000 -j DROP
[288840:22965070] -A INPUT -i cali+ -m comment --comment "cali:pbFdTFCLcV-MVLSS" -m mark --mark 0x1000000/0x1000000 -j ACCEPT
[0:0] -A INPUT -i cali+ -m comment --comment "cali:u_TyW7ph8QsYnThE" -m mark ! --mark 0x1000000/0x1000000 -j DROP
[2365445:3596958553] -A INPUT -j KUBE-FIREWALL
[4869408:5466201263] -A FORWARD -m comment --comment "cali:umcmOn0WnTNOKJrp" -m comment --comment "Pre-approved by BPF programs." -m mark --mark 0x3000000/0x3000000 -j ACCEPT
[0:0] -A FORWARD -i cali+ -m comment --comment "cali:NnQ109Z-tVFkJGc1" -m comment --comment "From workload without BPF seen mark" -m mark ! --mark 0x1000000/0x1000000 -j DROP
[2759095:351663751] -A FORWARD -m comment --comment "cali:YmI_zfAgHIHbINEV" -m comment --comment "Mark pre-established flows." -m conntrack --ctstate RELATED,ESTABLISHED -j MARK --set-xmark 0x8000000/0x8000000
[2831498:392219529] -A FORWARD -o cali+ -m comment --comment "cali:-EFgmtwMJVO64q9s" -m comment --comment "To workload, check workload is known." -j cali-to-wl-dispatch
[331833:10815138912] -A FORWARD -i cali+ -m comment --comment "cali:wP1i1sEU71uRzM5d" -m comment --comment "To workload, mark has already been verified." -j ACCEPT
[2853447:1029960252] -A OUTPUT -m comment --comment "cali:jV0u_0SICA6NQt9x" -m comment --comment "Mark pre-established host flows." -m conntrack --ctstate RELATED,ESTABLISHED -j MARK --set-xmark 0x8000000/0x8000000
[3388442:11880278673] -A OUTPUT -j KUBE-FIREWALL
[0:0] -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
[0:0] -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
[1708669:146634066] -A cali-to-wl-dispatch -o cali5+ -m comment --comment "cali:s-h3O_VJJvyg0nUF" -g cali-to-wl-dispatch-5
[0:0] -A cali-to-wl-dispatch -o calid9ab8c033f9 -m comment --comment "cali:yP0JWT3bsoz6XJhH" -j ACCEPT
[1122829:245585463] -A cali-to-wl-dispatch -o calie2295745132 -m comment --comment "cali:voUbXKKnPFRrVLxK" -j ACCEPT
[0:0] -A cali-to-wl-dispatch -m comment --comment "cali:YRYUSAaB3tib9T3-" -m comment --comment "Unknown interface" -j DROP
[1688220:135901471] -A cali-to-wl-dispatch-5 -o cali51fba186272 -m comment --comment "cali:pUR7T3sVzy_bnssZ" -j ACCEPT
[20449:10732595] -A cali-to-wl-dispatch-5 -o cali5f35fcc4c93 -m comment --comment "cali:8L1wwuUAT1IJDURU" -j ACCEPT
[0:0] -A cali-to-wl-dispatch-5 -m comment --comment "cali:u_U11TQABMOZ13O3" -m comment --comment "Unknown interface" -j DROP
COMMIT
# Completed on Mon Sep 19 14:10:22 2022
# Generated by iptables-save v1.8.7 on Mon Sep 19 14:10:22 2022
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:CNI-DN-d1d9e23f797943d48e271 - [0:0]
:CNI-HOSTPORT-DNAT - [0:0]
:CNI-HOSTPORT-MASQ - [0:0]
:CNI-HOSTPORT-SETMARK - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-POSTROUTING - [0:0]
:cali-POSTROUTING - [0:0]
:cali-fip-snat - [0:0]
:cali-nat-outgoing - [0:0]
[106849:47239767] -A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
[69252:4155120] -A OUTPUT -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
[619241:79620343] -A POSTROUTING -m comment --comment "cali:O3lYWMrLQYEMJtB5" -j cali-POSTROUTING
[451612:69562700] -A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
[451772:69577038] -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -s 100.103.226.208/32 -p tcp -m tcp --dport 3001 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -s 127.0.0.1/32 -p tcp -m tcp --dport 3001 -j CNI-HOSTPORT-SETMARK
[0:0] -A CNI-DN-d1d9e23f797943d48e271 -p tcp -m tcp --dport 3001 -j DNAT --to-destination 100.103.226.208:3001
[0:0] -A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"k8s-pod-network\" id: \"4084d0987639edbf35515fba0c0090bc30259e9f202b74a7e308f6c174c7f225\"" -m multiport --dports 3001 -j CNI-DN-d1d9e23f797943d48e271
[0:0] -A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
[0:0] -A CNI-HOSTPORT-SETMARK -m comment --comment "CNI portfwd masquerade mark" -j MARK --set-xmark 0x2000/0x2000
[0:0] -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
[0:0] -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
[451772:69577038] -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
[0:0] -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
[0:0] -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
[0:0] -A cali-POSTROUTING -m comment --comment "cali:ggLCSZ9_r9uXnHIo" -m comment --comment "BPF loopback SNAT" -m mark --mark 0x3600000/0x3f00000 -j MASQUERADE --random-fully
[619277:79622926] -A cali-POSTROUTING -m comment --comment "cali:THrmnLhAi3cxMLBO" -j cali-fip-snat
[619277:79622926] -A cali-POSTROUTING -m comment --comment "cali:ez7mOi3T8N6rtIlJ" -j cali-nat-outgoing
[0:0] -A cali-POSTROUTING -o vxlan.calico -m comment --comment "cali:Aah0tlBqoqE8PrS0" -m addrtype ! --src-type LOCAL --limit-iface-out -m addrtype --src-type LOCAL -j MASQUERADE --random-fully
[167645:10058700] -A cali-nat-outgoing -m comment --comment "cali:k9SjE1bWVYbIZgCm" -m mark --mark 0x3800000/0x3f00000 -j MASQUERADE --random-fully
COMMIT
# Completed on Mon Sep 19 14:10:22 2022

This machine is running Kubernetes 1.24.4 and Calico 3.23.3 atop Flatcar Container Linux 3277.1.2, reporting Linux kernel version 5.15.63.

seh avatar Sep 19 '22 14:09 seh

@gongzh @seh heads up - this workaround would not work for 3.24+ . As we understand the issue now, there is in fact little guarantee that a packet will go through iptables. In fact, we are doing all we can for packets to not go through iptables if possible. However, things like hostPort etc. rely on iptable rules injected by non-calico entity. In the hostPort case, the CNI plugin. We do not support that. However, we are currently working on how to identify traffic that originally goes to host and then ends up at a pod/container so that we can support that usecase.

tomastigera avatar Sep 20 '22 23:09 tomastigera

If I understand the last two sentences correctly, there's a gap between how the CNI plugin works to support host port traffic and how Calico handles traffic (relying on iptables versus trying to avoid iptables being involved), but you're intending to close that gap. If that's what you meant, do you expect that the fix or accommodation will make it into a 3.24-era release?

seh avatar Sep 21 '22 00:09 seh

@seh right, we are looking into how to close the gap and we will most likely backport that too.

tomastigera avatar Sep 21 '22 01:09 tomastigera

The issue is fixed and should work as expected in v3.25

tomastigera avatar Oct 04 '22 22:10 tomastigera

@mazdakn could you backport it to 3.24 too? :pray:

tomastigera avatar Oct 04 '22 22:10 tomastigera

Sure @tomastigera . Working on it, both for 3.23 and 3.24.

mazdakn avatar Oct 04 '22 22:10 mazdakn

@tomastigera, if we use Calico version 3.24.2 with #6813 included, do we still need to set the "FELIX_BPFHostConntrackBypass" environment variable or use the new FelixConfiguration field (#6641), or are those workarounds obsolete now?

seh avatar Oct 19 '22 12:10 seh

@seh Calico 3.24.2 should have the fix, so you do not need to set that configuration. In case, you would like to change it, you can do using either of methods.

mazdakn avatar Oct 19 '22 18:10 mazdakn

To make sure I understand, is it correct that we don't need to set the "FELIX_BPFHostConntrackBypass" environment variable to "true," but if we still had it set like that because we hadn't removed it yet, things would fine, and we wouldn't be subject to this defect either way?

seh avatar Oct 19 '22 18:10 seh

As it stand now, FELIX_BPFHostConntrackBypass setting does not affect the hostPort behaviour, i.e. setting it to true or false won't matter to the hostPort traffic. In both cases, hostPort should work fine.

mazdakn avatar Oct 19 '22 21:10 mazdakn

@hakman, with this, we can revert kubernetes/kops#14205 once we adopt Calico version 3.24.2 or later.

seh avatar Oct 19 '22 21:10 seh

See kubernetes/kops#14685 for removing this workaround in kOps.

seh avatar Nov 28 '22 13:11 seh