calico
calico copied to clipboard
K8s NodePort Service can only be accessed locally but not externally when using eBPF dataplane
Expected Behavior
I enabled eBPF dataplane, and installed a k8s service with a backend Nginx pod, the yaml file is like: apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: type: NodePort ports:
- port: 80 protocol: TCP name: http nodePort: 30942 selector: app: nginx
apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx role: backend spec: containers: - name: nginx image: nginx ports: - containerPort: 80
I can access node port service from the 2 k8s nodes(including the master) locally, which IPs are 10.169.208.204 and 10.169.208.233(the master, tainted), but when I accesses the nodePort service from other servers(10.169.208.229,...), the service can't be accessed:
-
For node 10.169.208.204, the result is : curl 10.169.208.204:30942 curl: (7) Failed to connect to 10.169.208.204 port 30942: Connection refused The tcpdump result here:
-
For node 10.169.208.233, the result is: curl 10.169.208.233:30942 curl: (7) Failed to connect to 10.169.208.233 port 30942: Connection timed out
When I disabled the eBPF and restored the Kube-proxy, the nodePort service here can be accessed normally.
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
The script to enable eBPF dataplane: k8s_ep=$(kubectl get endpoints kubernetes -o wide | grep kubernetes | cut -d " " -f 4) k8s_host=$(echo $k8s_ep | cut -d ":" -f 1) k8s_port=$(echo $k8s_ep | cut -d ":" -f 2)
cat <<EOF > ${WORKDIR}/k8s_service.yaml kind: ConfigMap apiVersion: v1 metadata: name: kubernetes-services-endpoint namespace: kube-system data: KUBERNETES_SERVICE_HOST: "KUBERNETES_SERVICE_HOST" KUBERNETES_SERVICE_PORT: "KUBERNETES_SERVICE_PORT" EOF
sed -i "s/KUBERNETES_SERVICE_HOST/${k8s_host}/" ${WORKDIR}/k8s_service.yaml sed -i "s/KUBERNETES_SERVICE_PORT/${k8s_port}/" ${WORKDIR}/k8s_service.yaml
kubectl apply -f ${WORKDIR}/k8s_service.yaml
echo "Disable kube-proxy:" kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
echo "Enable eBPF:" calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'
echo "Enable Direct Server Return(DSR) mode: optional" #calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
Context
Your Environment
- Calico version v3.23.2, v3.23.1(also)
-
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.22.1
- Operating System and version: Ubuntu 20.04, arm64
- Link to your project (optional):
`
`
This would be one for @tomas-mazak, though I believe he's out for a few days.
I think @caseydavenport should have pinged @tomastigera
Yep you're absolutely correct. :facepalm:
I do not quite follow what is the problem. It seems like when you are trying to access the nodeport from either of the 2 k8s node, you get a failure. Or am I reading it just wrong and you mean that if you access from say 10.169.208.229
to either of the two nodes you get these 2 different errors? I think that is what you meant.
Is the actuall pod running?
What is the output from kubectl exec calico-node-xyz -n calico-system -- calico-node -bpf nat dump
for both of the nodes?
Could you also set BPFLogLevel
felix config to Debug
and get us output from tc exec bpf debug >& tc.log
on the node which you are trying to connect to? Lets try the one with the pod first. Beware: it can be a lot of output.
BPFLogLevel
I do not quite follow what is the problem. It seems like when you are trying to access the nodeport from either of the 2 k8s node, you get a failure. Or am I reading it just wrong and you mean that if you access from say
10.169.208.229
to either of the two nodes you get these 2 different errors? I think that is what you meant.Is the actuall pod running?
What is the output from
kubectl exec calico-node-xyz -n calico-system -- calico-node -bpf nat dump
for both of the nodes?
Yes, the output were the cases I accessed the 2 k8s nodes from the server "10.169.208.229", which is not in the k8s cluster.
On node 10.169.208.204(master, tainted): # calico-node -bpf nat dump 2022-07-27 02:57:03.265 [INFO][2459138] confd/maps.go 433: Loaded map file descriptor. fd=0x9 name="/sys/fs/bpf/tc/globals/cali_v4_nat_fe3" 2022-07-27 02:57:03.265 [INFO][2459138] confd/maps.go 433: Loaded map file descriptor. fd=0xa name="/sys/fs/bpf/tc/globals/cali_v4_nat_be" 10.169.208.204 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.16.1.10 port 53 proto 6 id 3 count 2 local 2 3:0 192.168.26.194:53 3:1 192.168.26.195:53 172.16.1.10 port 53 proto 17 id 2 count 2 local 2 2:0 192.168.26.194:53 2:1 192.168.26.195:53 172.16.1.10 port 9153 proto 6 id 4 count 2 local 2 4:0 192.168.26.194:9153 4:1 192.168.26.195:9153 172.26.12.101 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.26.192 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.16.1.1 port 443 proto 6 id 1 count 1 local 0 1:0 10.169.208.204:6443 172.16.1.147 port 80 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 172.17.0.1 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.122.1 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 192.168.202.25 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80 255.255.255.255 port 30942 proto 6 id 0 count 2 local 1 0:0 192.168.26.199:80 0:1 192.168.4.3:80
On node 10.169.210.208: calico-node -bpf nat dump 2022-07-27 03:01:30.386 [INFO][2162491] confd/maps.go 433: Loaded map file descriptor. fd=0x9 name="/sys/fs/bpf/tc/globals/cali_v4_nat_fe3" 2022-07-27 03:01:30.387 [INFO][2162491] confd/maps.go 433: Loaded map file descriptor. fd=0xa name="/sys/fs/bpf/tc/globals/cali_v4_nat_be" 255.255.255.255 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.16.1.1 port 443 proto 6 id 0 count 1 local 0 0:0 10.169.208.204:6443 172.16.1.10 port 53 proto 6 id 2 count 2 local 0 2:0 192.168.26.194:53 2:1 192.168.26.195:53 172.16.1.10 port 53 proto 17 id 1 count 2 local 0 1:0 192.168.26.194:53 1:1 192.168.26.195:53 172.17.0.1 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 192.168.4.0 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 10.169.210.108 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.16.1.10 port 9153 proto 6 id 3 count 2 local 0 3:0 192.168.26.194:9153 3:1 192.168.26.195:9153 172.16.1.147 port 80 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 172.26.12.100 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80 192.168.122.1 port 30942 proto 6 id 4 count 2 local 1 4:0 192.168.4.3:80 4:1 192.168.26.199:80
pinging @tomastigera
This issue disappeared from release v3.25.0-dev(commit: c4f28c9666e8e5934c863b38ac162ced3f891bc6) The original version is v3.23.2.
This issue disappeared from release v3.25.0-dev(commit: c4f28c9)
@tomastigera is that expected?
How's it going? @tomastigera
This issue disappeared from release v3.25.0-dev(commit: c4f28c9)
@tomastigera is that expected?
I still meet this issue on arm64 platform, I will do a further analysis if time is available for me in the future.