k3s icon indicating copy to clipboard operation
k3s copied to clipboard

DNS resolution in alpine (musl) based containers fails when the host system has `search .` in `resolv.conf` with 1.25.0

Open ekeih opened this issue 3 years ago • 10 comments
trafficstars

Environmental Info: K3s Version:

/opt/k3s -v
k3s version v1.25.0+k3s1 (26e94057)
go version go1.19

Node(s) CPU architecture, OS, and Version:

uname -a
Linux alderaan 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

Cluster Configuration:

  • 1 server (single-node cluster)

Describe the bug:

alpine based containers have problems with DNS resolution.

The cluster (the single node) is currently running 43 pods and DNS resolution works fine for 39 of those. There are 4 pods with DNS issues and the one thing they have in common is that they are based on alpine images. It is reproducible by starting an alpine based pod, see below.

Steps To Reproduce:

k3s is installed by downloading the binary to /opt/k3s and creating the following systemd unit:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target
Wants=network-online.target

[Service]
Type=notify

ExecStart=/opt/k3s server \
          --disable traefik \
          --node-ip=10.14.68.1 \
          --advertise-address=10.14.68.1 \
          --node-external-ip=192.168.24.5 \
          --kube-apiserver-arg="enable-admission-plugins=DefaultTolerationSeconds" \
          --kube-apiserver-arg="default-not-ready-toleration-seconds=14400" \
          --kube-apiserver-arg="default-unreachable-toleration-seconds=14400" \
          --kubelet-arg="image-gc-high-threshold=70" \
          --kubelet-arg="image-gc-low-threshold=60" \
          --disable-network-policy \
          --disable-kube-proxy \
          --flannel-backend=none \
          --cluster-cidr=10.12.0.0/16 \
          --service-cidr=10.13.0.0/16 \
          --cluster-dns=10.13.0.10

KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
  • Cilium 1.12.1 as CNI
    • This has been running fine for months with k3s 1.24
    • Cilium itself reports as healthy
    • Cluster network connectivity looks fine
    • Currently, I don't think cilium is the issue, but there is the possibility that it is some issue with cilium and k3s 1.25.
  • Using the bundled coredns deployment with coredns 1.9.1
    • Updating it to 1.9.4 made no difference
kubectl run --rm -it --image alpine debug-alpine
If you don't see a command prompt, try pressing enter.
# DNS config of the pod

/ # cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local .
nameserver 10.13.0.10
options ndots:5
# Network connectivity to the outside works

/ # ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=54 time=20.219 ms
^C
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 20.219/20.219/20.219 ms
# nslookup works

/ # nslookup google.com
Server:		10.13.0.10
Address:	10.13.0.10:53

Non-authoritative answer:
Name:	google.com
Address: 142.250.186.174

Non-authoritative answer:
Name:	google.com
Address: 2a00:1450:4001:82b::200e
# DNS resolution of ping does not work

/ # ping google.com
ping: bad address 'google.com'
# apk (alpines package manager) can't resolve hostnames

/ # apk update
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/main: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/main: No such file or directory
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/community: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/community: No such file or directory
2 errors; 14 distinct packages available

Doing the same with an ubuntu image works just fine.

Expected behavior: DNS in alpine containers should work normally, e.g. ping google.com should work.

Actual behavior: Some DNS queries in alpine based images do not work. It seems to be dependent on the tool, e.g. ping fails while nslookup works. They probably handle DNS resolution differently.

Additional context / logs:

  • I did not find any helpful logs so far. Let me know what information might help.
  • You might wonder why I create a k3s issue when the common denominator is alpine?
    • The alpine based pods have been working without any issues for months.
    • The only thing I changed today was to update to k3s 1.25, so it looks a lot like that triggered the issue.

ekeih avatar Sep 14 '22 00:09 ekeih

When running tcpdump on the node while I run ping google.com in an alpine pod I noticed that it tries to resolve it as an cluster internal service. Which in general is okay and normal but the last queries in the output look weird:

00:11:31.674603 lxce1d3b07301ea In  IP 10.12.0.100.45195 > 10.12.0.116.domain: 3842+ A? google.com.kube-system.svc.cluster.local. (58)
00:11:31.674652 lxcc486f7ab7d5a Out IP 10.12.0.100.45195 > 10.12.0.116.domain: 3842+ A? google.com.kube-system.svc.cluster.local. (58)
00:11:31.674764 lxce1d3b07301ea In  IP 10.12.0.100.45195 > 10.12.0.116.domain: 4964+ AAAA? google.com.kube-system.svc.cluster.local. (58)
00:11:31.674769 lxcc486f7ab7d5a Out IP 10.12.0.100.45195 > 10.12.0.116.domain: 4964+ AAAA? google.com.kube-system.svc.cluster.local. (58)
00:11:31.675051 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.45195: 3842 NXDomain*- 0/1/0 (151)
00:11:31.675086 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.45195: 3842 NXDomain*- 0/1/0 (151)
00:11:31.675324 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.45195: 4964 NXDomain*- 0/1/0 (151)
00:11:31.675331 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.45195: 4964 NXDomain*- 0/1/0 (151)
00:11:31.675398 lxce1d3b07301ea In  IP 10.12.0.100.42548 > 10.12.0.116.domain: 26509+ A? google.com.svc.cluster.local. (46)
00:11:31.675414 lxcc486f7ab7d5a Out IP 10.12.0.100.42548 > 10.12.0.116.domain: 26509+ A? google.com.svc.cluster.local. (46)
00:11:31.675451 lxce1d3b07301ea In  IP 10.12.0.100.42548 > 10.12.0.116.domain: 27091+ AAAA? google.com.svc.cluster.local. (46)
00:11:31.675455 lxcc486f7ab7d5a Out IP 10.12.0.100.42548 > 10.12.0.116.domain: 27091+ AAAA? google.com.svc.cluster.local. (46)
00:11:31.675918 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.42548: 27091 NXDomain*- 0/1/0 (139)
00:11:31.675943 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.42548: 27091 NXDomain*- 0/1/0 (139)
00:11:31.676002 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.42548: 26509 NXDomain*- 0/1/0 (139)
00:11:31.676016 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.42548: 26509 NXDomain*- 0/1/0 (139)
00:11:31.676207 lxce1d3b07301ea In  IP 10.12.0.100.58751 > 10.12.0.116.domain: 57219+ A? google.com.cluster.local. (42)
00:11:31.676241 lxcc486f7ab7d5a Out IP 10.12.0.100.58751 > 10.12.0.116.domain: 57219+ A? google.com.cluster.local. (42)
00:11:31.676274 lxce1d3b07301ea In  IP 10.12.0.100.58751 > 10.12.0.116.domain: 57766+ AAAA? google.com.cluster.local. (42)
00:11:31.676280 lxcc486f7ab7d5a Out IP 10.12.0.100.58751 > 10.12.0.116.domain: 57766+ AAAA? google.com.cluster.local. (42)
00:11:31.676508 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.58751: 57766 NXDomain*- 0/1/0 (135)
00:11:31.676533 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.58751: 57766 NXDomain*- 0/1/0 (135)
00:11:31.676582 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.58751: 57219 NXDomain*- 0/1/0 (135)
00:11:31.676587 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.58751: 57219 NXDomain*- 0/1/0 (135)
00:11:31.676831 lxce1d3b07301ea In  IP 10.12.0.100.47663 > 10.12.0.116.domain: 26914+ [|domain]
00:11:31.676861 lxcc486f7ab7d5a Out IP 10.12.0.100.47663 > 10.12.0.116.domain: 26914+ [|domain]
00:11:31.676890 lxce1d3b07301ea In  IP 10.12.0.100.47663 > 10.12.0.116.domain: 27453+ [|domain]
00:11:31.676895 lxcc486f7ab7d5a Out IP 10.12.0.100.47663 > 10.12.0.116.domain: 27453+ [|domain]
00:11:31.676994 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.47663: 27453 FormErr- [0q] 0/0/0 (12)
00:11:31.677007 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.47663: 27453 FormErr- [0q] 0/0/0 (12)
00:11:31.677044 lxcc486f7ab7d5a In  IP 10.12.0.116.domain > 10.12.0.100.47663: 26914 FormErr- [0q] 0/0/0 (12)
00:11:31.677049 lxce1d3b07301ea Out IP 10.12.0.116.domain > 10.12.0.100.47663: 26914 FormErr- [0q] 0/0/0 (12)

For nslookup google.com the tcpdump output looks differently, it directly queries the correct domain, which explains why it works. nslookup is probably not checking the search domains from /etc/resolv.conf at all:

00:31:55.535780 lxce1d3b07301ea In  IP 10.12.0.100.42053 > 10.12.0.116.53: 47584+ A? google.com. (28)
00:31:55.535815 lxcc486f7ab7d5a Out IP 10.12.0.100.42053 > 10.12.0.116.53: 47584+ A? google.com. (28)
00:31:55.535839 lxce1d3b07301ea In  IP 10.12.0.100.42053 > 10.12.0.116.53: 48454+ AAAA? google.com. (28)
00:31:55.535844 lxcc486f7ab7d5a Out IP 10.12.0.100.42053 > 10.12.0.116.53: 48454+ AAAA? google.com. (28)
00:31:55.536115 lxcc486f7ab7d5a In  IP 10.12.0.116.57822 > 192.168.24.1.53: 36199+ [1au] AAAA? google.com. (39)
00:31:55.536168 enp88s0 Out IP 192.168.24.5.57822 > 192.168.24.1.53: 36199+ [1au] AAAA? google.com. (39)
00:31:55.536206 lxcc486f7ab7d5a In  IP 10.12.0.116.38497 > 192.168.24.1.53: 11420+ [1au] A? google.com. (39)
00:31:55.536251 enp88s0 Out IP 192.168.24.5.38497 > 192.168.24.1.53: 11420+ [1au] A? google.com. (39)
00:31:55.538041 enp88s0 In  IP 192.168.24.1.53 > 192.168.24.5.57822: 36199 1/0/1 AAAA 2a00:1450:4001:82b::200e (67)
00:31:55.538070 lxcc486f7ab7d5a Out IP 192.168.24.1.53 > 10.12.0.116.57822: 36199 1/0/1 AAAA 2a00:1450:4001:82b::200e (67)
00:31:55.538227 lxcc486f7ab7d5a In  IP 10.12.0.116.53 > 10.12.0.100.42053: 48454 1/0/0 AAAA 2a00:1450:4001:82b::200e (66)
00:31:55.538238 enp88s0 In  IP 192.168.24.1.53 > 192.168.24.5.38497: 11420 1/0/1 A 142.250.186.174 (55)
00:31:55.538248 lxcc486f7ab7d5a Out IP 192.168.24.1.53 > 10.12.0.116.38497: 11420 1/0/1 A 142.250.186.174 (55)
00:31:55.538251 lxce1d3b07301ea Out IP 10.12.0.116.53 > 10.12.0.100.42053: 48454 1/0/0 AAAA 2a00:1450:4001:82b::200e (66)
00:31:55.538472 lxcc486f7ab7d5a In  IP 10.12.0.116.53 > 10.12.0.100.42053: 47584 1/0/0 A 142.250.186.174 (54)
00:31:55.538482 lxce1d3b07301ea Out IP 10.12.0.116.53 > 10.12.0.100.42053: 47584 1/0/0 A 142.250.186.174 (54)

Based on that I also noticed that ping google.com. (note the dot at the end) works in the container.

ekeih avatar Sep 14 '22 00:09 ekeih

I compared the /etc/resolv.conf files of a pod in 1.24 and a 1.25 cluster and noticed a minimal difference. Both pods have been started with kubectl run --rm -it --image alpine -- debug:

1.24: search kube-system.svc.cluster.local svc.cluster.local cluster.local -> has no issues 1.25: search kube-system.svc.cluster.local svc.cluster.local cluster.local . -> has issues

So the single dot at the end seems to cause the trouble.

To verify it I configured the DNS config of one of the alpine based pods to get rid of that leading dot in 1.25:

spec:
   dnsPolicy: "None"
   dnsConfig:
     nameservers:
       - 10.13.0.10
     searches:
       - kube-system.svc.cluster.local
       - svc.cluster.local
       - cluster.local
     options:
       - name: ndots
         value: "5"

With those settings the DNS resolution inside of that alpine based pod works as expected.

ekeih avatar Sep 14 '22 01:09 ekeih

Digging further I noticed that the /etc/resolv.conf on the node itself contained search ..

  • This was already the case while the node was running 1.24, so the handling of that seems to have changed somehow with 1.25
  • Changing the nodes /etc/resolv.conf to something like search foo.local and restarting the alpine based pods fixes the issue. The foo.local is also added to the search domains inside of the pods, but e.g. for ping google.com a tcpdump shows that at first all the cluster internal service records are tried, then google.com.foo.local and in the end google.com which succeeds.
  • Where does the . in the nodes /etc/resolv.conf come from? It looks like systemd-networkd sets search . in case there is no other search domain configured (by default it does not use search domains provided via DHCP). So far I have found no way to tell systemd-networkd to set no search domain but also not search ., so as a workaround I configure Domains=somelocaldomain in its config which leads to search somelocaldomain on the node.
  • Because systemd-networkd itself sets . as a default I assume that this is a valid configuration in general.

I am currently not sure what component should be considered to have an issue here:

  • k3s 1.25: adding the . to the container?
  • alpine: not handling the . correctly inside of the container?
  • systemd-networkd: setting the search . on the host system in the first place?

I am looking forward to input from other people about this. Thanks in advance! :)

ekeih avatar Sep 14 '22 01:09 ekeih

It looks like this was introduced upstream with 1.25.0 (https://github.com/kubernetes/kubernetes/pull/109441) and will be fixed upstream with 1.25.1 (https://github.com/kubernetes/kubernetes/pull/112157).

(Edit) Additional links:

  • https://github.com/coreos/fedora-coreos-tracker/issues/1287
  • https://github.com/systemd/systemd/pull/17201
  • https://www.openwall.com/lists/musl/2022/08/31/5

ekeih avatar Sep 14 '22 09:09 ekeih

I am not sure if you want to keep this open until it is fixed so other people can find it or if you want to close it because it will be fixed upstream? Either way, I hope the information saves other people a few hours of debugging :)

P.S. k3s is awesome, thanks everyone for working on it! 🚀

ekeih avatar Sep 14 '22 09:09 ekeih

I am not sure if you want to keep this open until it is fixed so other people can find it or if you want to close it because it will be fixed upstream? Either way, I hope the information saves other people a few hours of debugging :)

P.S. k3s is awesome, thanks everyone for working on it! rocket

Thanks for documenting this and describe how you were debugging it @ekeih! Yes, let's keep this open and close it as soon as we move to 1.25.1 and verify that it is fixed

manuelbuil avatar Sep 14 '22 09:09 manuelbuil

Thanks so much for the kind words and for all the info @ekeih! As Manuel stated, let's keep this open until we release 1.25.1 next week(ish), to make sure our QA can validate the fix. Thanks again!

cwayne18 avatar Sep 14 '22 14:09 cwayne18

I ran into very similar issues when testing K3s v1.25 on Equinix Metal yesterday on Ubuntu 22.04

(From helm)

FATA[0001] rpc error: code = InvalidArgument desc = application spec for openfaas-operator is invalid: InvalidSpecError: repository not accessible:
rpc error: code = Unknown desc = error testing repository connectivity: Get "https://github.com/openfaas/faas-netes.git/info/refs?service=git-upload-pack":
x509: certificate is valid for [cloudfront.net](http://cloudfront.net/), *.[cloudfront.net](http://cloudfront.net/), not [github.com](http://github.com/)

I also saw that ping and curl to Google failed from ghcr.io/openfaas/curl:latest with kubectl run with errors about Cloudfront and incorrect TLS cert names. An up to date CA certs bundle was available in the container.

Do you think it's related?

alexellis avatar Sep 21 '22 07:09 alexellis

No, that looks like you have something in your environment (an upstream DNS server with ad-blocking capabilities perhaps) that is redirecting your requests to CloudFront instead of the requested domain.

brandond avatar Sep 21 '22 07:09 brandond

Thanks for the clarification. This was just a regular bare-metal host on Equinix Metal. I tried out two different regions, but did nothing differently (IMHO) between installing on Equinix Metal and Linode. On Linode it worked without any issues.

alexellis avatar Sep 21 '22 08:09 alexellis

This has been fixed and validated in v1.25.2-rc1+k3s1 via the new k8s version. Thank you for all the details on this issue! You will see the fixes when the release is promoted 🎉

rancher-max avatar Sep 26 '22 21:09 rancher-max

Hello, I'm experiencing a DNS resolution issue with Alpine-based containers on my Ubuntu 22.04 system running K3s version 1.27.9+k3s1, similar to the one previously reported in issue #6132. I've applied a fix by setting dnsPolicy to None and manually configuring dnsConfig with specific nameservers, searches, and options. This workaround is effective, but I'm wondering if there's an alternative solution that doesn't require such specific DNS configuration adjustments. Has anyone else faced this with the same K3s version and found a different workaround? Thank you for any suggestions or guidance.

anastas001 avatar Feb 08 '24 19:02 anastas001

@anastas001 please open a new issue. This was a regression in upstream Kubernetes that has long been resolved.

brandond avatar Feb 08 '24 19:02 brandond