telepresence icon indicating copy to clipboard operation
telepresence copied to clipboard

Make telepresence discover that DNS IP is in conflict with a proxied subnet

Open thallgren opened this issue 3 years ago • 3 comments

Background

Users are experiencing loss of network after they do telepresence connect. This will happen if the DNS is configured to use a server IP that is mapped by a subnet in Telepresence's TUN-device.

A common scenario is an ec2 instance that uses a DNS server at 172.31.0.2 (common default nameserver). Typical output from resolvectl dns:

$ resolvectl dns
Global:
Link 2 (ens5): 172.31.0.2

When Telepresence connects to an EKS cluster, the cluster uses a pod-subnet with an overlapping IP. The log shows:

<timestamp> info    daemon/watch-cluster-info : Adding pod subnet 172.31.0.0/18

While connected, all requests to 172.31.0.2 will be routed to the cluster and the cluster doesn't have a DNS server at that IP. All requests for names covered by the exclude-suffix list will then fail (the ones not covered will be routed to the cluster's proper DNS and likely succeed).

Workaround

Adding the local DNS IP to the never-proxy as 172.31.0.2/32 solves the problem. The DNS queries will no longer be routed to the cluster and instead proceed to their original network destination.

Desired improvement

Telepresence should detect this situation and automatically add the conflicting entry to the never-proxy list.

Alternative

The telepresence test-vpn could detect this conflict and suggest the aforementioned workaround.

thallgren avatar Feb 22 '22 16:02 thallgren

Thanks @thallgren for filing this. One more ask: Please default route entries for the VPN gateway. My SSH finally worked after this :) More context in this issue.

bhavitsharma avatar Feb 23 '22 23:02 bhavitsharma

and thanks again for very prompt responses and finding the solution!

bhavitsharma avatar Feb 23 '22 23:02 bhavitsharma

@bhavitsharma you're more than welcome. Your input was very helpful in finding the root cause. And now you've also verified that the workaround indeed solves your problem. Thanks!

thallgren avatar Feb 24 '22 06:02 thallgren