calico icon indicating copy to clipboard operation
calico copied to clipboard

Cannot access UDP service via ClsuterIP from the host

Open jclab-joseph opened this issue 2 years ago • 5 comments

Expected Behavior

udp should work.

Current Behavior

Cannot access UDP service via ClsuterIP from the host.

Failing to query kube-dns from Pods with hostNetwork: true.

See https://github.com/kubernetes/kubernetes/issues/112614

Steps to Reproduce (for bugs)

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: dns-test
  name: dns-test
  namespace: default
spec:
  containers:
  - args:
    - /bin/sh
    - -c
    - sleep 1000
    image: debian:bullseye
    imagePullPolicy: IfNotPresent
    name: dns-test
  dnsPolicy: ClusterFirstWithHostNet
  hostNetwork: true

In pod:

# nslookup kubernetes.default.svc.cluster.local
;; connection timed out; no servers could be reached

If not hostNetwork: true it works fine. Also, if I change server to pod ip in nslookup it works fine.

Your Environment

  • Calico version : v3.18.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.12", GitCommit:"4bf2e32bb2b9fdeea19ff7cdc1fb51fb295ec407", GitTreeState:"clean", BuildDate:"2021-10-27T17:07:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version: Ubuntu 20.04 LTS
  • Link to your project (optional):

More informations

it works if Pods on the same node, but not between different nodes.

(node-1):
08:46:08.041819 IP 172.30.194.128.50941 > 172.21.202.9.53: 65345+ A? kubernetes.default.svc.cluster.local.default.svc.cluster.local. (80)

(node-1) # ip addr | grep 172.30.194.128
    inet 172.30.194.128/32 scope global vxlan.calico

# coredns-74ff55c5b-47qpr                           1/1     Running   0          6m34s   172.21.202.9   <none>      node-2     <none>

No packets are captured in vxlan.calico on node-2.

...

node-2: ICMP works but UDP doesn't.

09:10:55.189157 IP 172.30.194.128 > 172.21.202.9: ICMP echo request, id 2, seq 2, length 64
09:10:55.189386 IP 172.21.202.9 > 172.30.194.128: ICMP echo reply, id 2, seq 2, length 64
```

jclab-joseph avatar Sep 27 '22 01:09 jclab-joseph

If not hostNetwork: true it works fine.

The packet path for pod<->pod in VXLAN mode is slightly different than node<->pod.

For node to pod traffic, I would expect routing to send the pod via the Calico VXLAN device and that the kernel would choose the source address assigned to the Calico VXLAN device - the selection of the address is important since it will be used on the receiving node to route traffic back using the VXLAN tunnel rather than unencap'd.

From what I can gather, this appears to be working as expected for you - the ICMP cross-host traffic seems to be working fine.

It appears that you are seeing the encap'd packet leave the host, but that it's not arriving on the remote host - might there be some firewall rule update required in your network to allow that traffic?

caseydavenport avatar Oct 12 '22 21:10 caseydavenport

@caseydavenport There are no firewall settings. In iptables there are only rules generated by kubernetds.

jclab-joseph avatar Oct 12 '22 21:10 jclab-joseph

Where is your cluster running? On-prem? Public cloud?

caseydavenport avatar Oct 18 '22 17:10 caseydavenport

On-prem with kubeadm

jclab-joseph avatar Oct 18 '22 22:10 jclab-joseph

So I think we need to pinpoint where the traffic is being dropped. Probably need to take a look at the source node's iptables packet counts to see if there are any rules incrementing when sending UDP traffic.

It would also be nice if we could see a capture that shows the traffic as seen on the VXLAN interface, but also the primary interface on the node.

Then again on the remote host - is it receiving the encap'd packet and then dropping it rather than unencp? Or is it just never receiving the packet altogether.

caseydavenport avatar Oct 18 '22 22:10 caseydavenport

@jclab-joseph have you been able to resolve this issue? I also just ran into the exact same problem.

pckbls avatar Dec 01 '22 13:12 pckbls

Can reopen if more info provided

tomastigera avatar Mar 21 '23 17:03 tomastigera