calico K8s NodePort Service can only be accessed locally but not externally when using eBPF dataplane

Expected Behavior

I enabled eBPF dataplane, and installed a k8s service with a backend Nginx pod, the yaml file is like: apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: type: NodePort ports:

port: 80 protocol: TCP name: http nodePort: 30942 selector: app: nginx

apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx role: backend spec: containers: - name: nginx image: nginx ports: - containerPort: 80

I can access node port service from the 2 k8s nodes(including the master) locally, which IPs are 10.169.208.204 and 10.169.208.233(the master, tainted), but when I accesses the nodePort service from other servers(10.169.208.229,...), the service can't be accessed:

For node 10.169.208.204, the result is : curl 10.169.208.204:30942 curl: (7) Failed to connect to 10.169.208.204 port 30942: Connection refused The tcpdump result here:
For node 10.169.208.233, the result is: curl 10.169.208.233:30942 curl: (7) Failed to connect to 10.169.208.233 port 30942: Connection timed out

When I disabled the eBPF and restored the Kube-proxy, the nodePort service here can be accessed normally.

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

The script to enable eBPF dataplane: k8s_ep=$(kubectl get endpoints kubernetes -o wide | grep kubernetes | cut -d " " -f 4) k8s_host=$(echo $k8s_ep | cut -d ":" -f 1) k8s_port=$(echo $k8s_ep | cut -d ":" -f 2)

cat <<EOF > ${WORKDIR}/k8s_service.yaml kind: ConfigMap apiVersion: v1 metadata: name: kubernetes-services-endpoint namespace: kube-system data: KUBERNETES_SERVICE_HOST: "KUBERNETES_SERVICE_HOST" KUBERNETES_SERVICE_PORT: "KUBERNETES_SERVICE_PORT" EOF

sed -i "s/KUBERNETES_SERVICE_HOST/${k8s_host}/" ${WORKDIR}/k8s_service.yaml sed -i "s/KUBERNETES_SERVICE_PORT/${k8s_port}/" ${WORKDIR}/k8s_service.yaml

kubectl apply -f ${WORKDIR}/k8s_service.yaml

echo "Disable kube-proxy:" kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'

echo "Enable eBPF:" calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'

echo "Enable Direct Server Return(DSR) mode: optional" #calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'

Context

Your Environment

Calico version v3.23.2, v3.23.1(also)
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.22.1
Operating System and version: Ubuntu 20.04, arm64
Link to your project (optional):

`

Jul 18 '22 09:07 TrevorTaoARM

This would be one for @tomas-mazak, though I believe he's out for a few days.

Jul 19 '22 22:07 caseydavenport

I think @caseydavenport should have pinged @tomastigera

Jul 26 '22 16:07 lwr20

Yep you're absolutely correct. :facepalm:

Jul 26 '22 16:07 caseydavenport

I do not quite follow what is the problem. It seems like when you are trying to access the nodeport from either of the 2 k8s node, you get a failure. Or am I reading it just wrong and you mean that if you access from say 10.169.208.229 to either of the two nodes you get these 2 different errors? I think that is what you meant.

Is the actuall pod running?

What is the output from kubectl exec calico-node-xyz -n calico-system -- calico-node -bpf nat dump for both of the nodes?

Jul 26 '22 17:07 tomastigera

Could you also set BPFLogLevel felix config to Debug and get us output from tc exec bpf debug >& tc.log on the node which you are trying to connect to? Lets try the one with the pod first. Beware: it can be a lot of output.

Jul 26 '22 17:07 tomastigera

BPFLogLevel

I do not quite follow what is the problem. It seems like when you are trying to access the nodeport from either of the 2 k8s node, you get a failure. Or am I reading it just wrong and you mean that if you access from say 10.169.208.229 to either of the two nodes you get these 2 different errors? I think that is what you meant.

Is the actuall pod running?

What is the output from kubectl exec calico-node-xyz -n calico-system -- calico-node -bpf nat dump for both of the nodes?

Yes, the output were the cases I accessed the 2 k8s nodes from the server "10.169.208.229", which is not in the k8s cluster.

On node 10.169.208.204(master, tainted): # calico-node -bpf nat dump 2022-07-27 02:57:03.265 [INFO][2459138] confd/maps.go 433: Loaded map file descriptor. fd=0x9 name="/sys/fs/bpf/tc/globals/cali_v4_nat_fe3" 2022-07-27 02:57:03.265 [INFO][2459138] confd/maps.go 433: Loaded map file descriptor. fd=0xa name="/sys/fs/bpf/tc/globals/cali_v4_nat_be" 10.169.208.204 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.16.1.10 port 53 proto 6 id 3 count 2 local 2 3:0 192.168.26.194:53 3:1 192.168.26.195:53 172.16.1.10 port 53 proto 17 id 2 count 2 local 2 2:0 192.168.26.194:53 2:1 192.168.26.195:53 172.16.1.10 port 9153 proto 6 id 4 count 2 local 2 4:0 192.168.26.194:9153 4:1 192.168.26.195:9153 172.26.12.101 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.26.192 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.16.1.1 port 443 proto 6 id 1 count 1 local 0 1:0 10.169.208.204:6443 172.16.1.147 port 80 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.17.0.1 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.122.1 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.202.25 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 255.255.255.255 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80

On node 10.169.210.208: calico-node -bpf nat dump 2022-07-27 03:01:30.386 [INFO][2162491] confd/maps.go 433: Loaded map file descriptor. fd=0x9 name="/sys/fs/bpf/tc/globals/cali_v4_nat_fe3" 2022-07-27 03:01:30.387 [INFO][2162491] confd/maps.go 433: Loaded map file descriptor. fd=0xa name="/sys/fs/bpf/tc/globals/cali_v4_nat_be" 255.255.255.255 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.16.1.1 port 443 proto 6 id 0 count 1 local 0 0:0 10.169.208.204:6443 172.16.1.10 port 53 proto 6 id 2 count 2 local 0 2:0 192.168.26.194:53 2:1 192.168.26.195:53 172.16.1.10 port 53 proto 17 id 1 count 2 local 0 1:0 192.168.26.194:53 1:1 192.168.26.195:53 172.17.0.1 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 192.168.4.0 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 10.169.210.108 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.16.1.10 port 9153 proto 6 id 3 count 2 local 0 3:0 192.168.26.194:9153 3:1 192.168.26.195:9153 172.16.1.147 port 80 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.26.12.100 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 192.168.122.1 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80

Jul 27 '22 03:07 TrevorTaoARM

pinging @tomastigera

Aug 23 '22 16:08 lwr20

This issue disappeared from release v3.25.0-dev(commit: c4f28c9666e8e5934c863b38ac162ced3f891bc6) The original version is v3.23.2.

Sep 15 '22 09:09 TrevorTaoARM

This issue disappeared from release v3.25.0-dev(commit: c4f28c9)

@tomastigera is that expected?

Sep 20 '22 16:09 lwr20

How's it going? @tomastigera

Nov 09 '22 09:11 renyunkang

This issue disappeared from release v3.25.0-dev(commit: c4f28c9)

@tomastigera is that expected?

I still meet this issue on arm64 platform, I will do a further analysis if time is available for me in the future.

Nov 10 '22 03:11 TrevorTaoARM

calico calico copied to clipboard

K8s NodePort Service can only be accessed locally but not externally when using eBPF dataplane

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

calico
calico copied to clipboard