wireguard-chart icon indicating copy to clipboard operation
wireguard-chart copied to clipboard

WireGuard client unable to resolve DNS, e.g., redis.redis.svc.cluster.local

Open yepchaos opened this issue 1 year ago • 4 comments

Hi, WireGuard client unable to resolve DNS, e.g., redis.redis.svc.cluster.local

My server yaml is

replicaCount: 1
autoscaling:
  enabled: false

service:
  enabled: true
  type: ClusterIP
wireguard:
  serverAddress: 172.32.32.1/24
  serverCidr: 172.32.32.0/24
  clients:
  - AllowedIPs: 172.32.32.2/32
    PublicKey: iaWRm9zdDyM95FXgoUpGNI2seN7vXyoQVG78ODGGJHY=

I checked the wg pod and run wg show wg0 and it's working fine

$ sudo wg show wg0
interface: wg0
  public key: EsPzisDRhRc5cpVHg5TSjfnWWkA6m82nhKczIxcZtU8=
  private key: (hidden)
  listening port: 51820

peer: iaWRm9zdDyM95FXgoUpGNI2seN7vXyoQVG78ODGGJHY=
  endpoint: 10.0.1.107:57281
  allowed ips: 172.32.32.2/32
  latest handshake: 10 seconds ago
  transfer: 173.18 KiB received, 248.56 KiB sent

from the pod I tried to nslookup, and it works fine.

~ $ nslookup redis.redis.svc.cluster.local
Server:		10.43.0.10
Address:	10.43.0.10:53


Name:	redis.redis.svc.cluster.local
Address: 10.43.113.234

and my client config is

[Interface]
PrivateKey = <privateKey>
Address = 172.32.32.2/32
DNS = 10.43.0.10, 8.8.8.8

[Peer]
PublicKey = EsPzisDRhRc5cpVHg5TSjfnWWkA6m82nhKczIxcZtU8=
AllowedIPs = 10.0.0.0/16, 10.43.0.0/16, 172.32.32.0/24
Endpoint = <public_id>:51820
PersistentKeepalive = 25

the ping works fine to

ping 172.32.32.1
PING 172.32.32.1 (172.32.32.1): 56 data bytes
64 bytes from 172.32.32.1: icmp_seq=0 ttl=64 time=9.281 ms
64 bytes from 172.32.32.1: icmp_seq=1 ttl=64 time=9.031 ms
64 bytes from 172.32.32.1: icmp_seq=2 ttl=64 time=13.573 ms

but it couldn't reach the nslookup redis.redis.svc.cluster.local, and I tried to traceroute to 10.43.113.234

traceroute 10.43.113.234
traceroute to 10.43.113.234 (10.43.113.234), 64 hops max, 52 byte packets
 1  172.32.32.1 (172.32.32.1)  6.532 ms  8.060 ms  7.810 ms
 2  * * * *

and I can't fix this, help my guys. I wanna connect to 10.43.* from my client (DNS is optional). I'm using k3s + cilium

yepchaos avatar Nov 14 '24 15:11 yepchaos

I see that you have the helm configuration set to define the wireguard service as a cluster ip service

service:
  enabled: true
  type: ClusterIP

What kind of load balancer or network path are you using to access the wg service from your client?

What pod cidr are you using for cilium? I believe by default it uses 10.0.0.0/8 and each node will be allocated a /24 within the /8 by cilium.

Do you have Hubble deployed with cilium and can you access the Hubble ui over a port forward? Does it show any flows/verdicts for your wg client's traffic?

bryopsida avatar Nov 16 '24 15:11 bryopsida

I'm using nginx-ingress for the load balancer, and there's the configuration

udp:
  "51820": wireguard/wireguard-wireguard:51820

and for Cilium I'm not using Hubble

cluster-pool-ipv4-cidr: 10.0.0.0/8

yepchaos avatar Nov 17 '24 07:11 yepchaos

I'm using nginx-ingress for the load balancer, and there's the configuration

udp:
  "51820": wireguard/wireguard-wireguard:51820

and for Cilium I'm not using Hubble

cluster-pool-ipv4-cidr: 10.0.0.0/8

Are you running any dashboards using your cilium metrics? If so do you see the dropped flow metric tick up when you are trying to access DNS from your client?

https://docs.cilium.io/en/stable/observability/metrics/#drops-forwards-l3-l4 The reason attribute will count dropped flows due to policy.

What I'm wondering is if the identity attached to the flow may be different than the server pod and getting rejected by cilium network policies. Are you running cilium in implicit deny mode and/or have network policies defined on the kube-system and/or the namespace wireguard is running in?

bryopsida avatar Nov 23 '24 14:11 bryopsida

Hello, i also have dns resolution problems at the wireguard client level where the pod itself can resolve and cilium is the network in kubernetes. Should i add more infos here or create a new issue?

One thing i noticed using hubble: the traffic that should get routed to the internal coredns service to allow name resolution is shown as target "world" which may be a hint to a routing issue i fail to understand.

Edit: I'm a tiny step further: Using the cilium pod in the namespace kube-system that runs on the same node i can see the difference between the nslookup calls directly in the wirguard pod and from a wireguard client using the command kubectl exec -it cilium-vjbzj -- cilium monitor |grep -i "10.0.19.39.*udp" where the wireguard pod has the ip 10.0.19.39 In case of the nslookup command issued in the wireguard pod it looks like this

#wireguard pod
nslookup kube-prometheus-stack-coredns.kube-system.svc.cluster.local
Server:         10.0.8.10
Address:        10.0.8.10:53


Name:   kube-prometheus-stack-coredns.kube-system.svc.cluster.local
Address: 10.0.17.162
Name:   kube-prometheus-stack-coredns.kube-system.svc.cluster.local
Address: 10.0.20.199

and the output of cilium monitor:

-> overlay flow 0xa100bf03 , identity 24347->39778 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.19.39:58662 -> 10.0.17.162:53 udp
-> endpoint 2517 flow 0x0 , identity 39778->24347 state reply ifindex lxc62a83218a13c orig-ip 10.0.17.162: 10.0.17.162:53 -> 10.0.19.39:58662 udp

So the dns lookup is directly routed to the coredns pod ips and not the kubernetes service but if a wireguard client does the lookup to 10.0.8.10 it gets stuck there:

nslookup kube-dns.kube-system.svc.cluster.local 10.0.8.10
;; communications error to 10.0.8.10#53: timed out
;; communications error to 10.0.8.10#53: timed out
;; communications error to 10.0.8.10#53: timed out
;; no servers could be reached

cilium monitor:

-> stack flow 0x0 , identity 24347->world state new ifindex 0 orig-ip 0.0.0.0: 10.0.19.39:12773 -> 10.0.8.10:53 udp
-> stack flow 0x0 , identity 24347->world state new ifindex 0 orig-ip 0.0.0.0: 10.0.19.39:6499 -> 10.0.8.10:53 udp
-> stack flow 0x0 , identity 24347->world state new ifindex 0 orig-ip 0.0.0.0: 10.0.19.39:3002 -> 10.0.8.10:53 udp

btw directly polling form the pod ip over wirguard works just fine:

nslookup kube-dns.kube-system.svc.cluster.local 10.0.17.162
Server:         10.0.17.162
Address:        10.0.17.162#53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.0.8.10
-> overlay flow 0x0 , identity 24347->39778 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.19.39:8962 -> 10.0.17.162:53 udp
-> endpoint 2517 flow 0x0 , identity 39778->24347 state reply ifindex lxc62a83218a13c orig-ip 10.0.17.162: 10.0.17.162:53 -> 10.0.19.39:8962 udp

Cardes avatar Apr 10 '25 11:04 Cardes