kubeone
kubeone copied to clipboard
Hetzner DNS Issue (resolv-conf)
What happened: i followed this guide: https://www.kubermatic.com/blog/kubernetes-on-hetzner-with-kubermatic-kubeone-in-2021/ Everything run through fine.
However, the DNS Servers are not configured properly.
Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is:
The Cause is: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues
Adding the flag: --resolv-conf
with the value /run/systemd/resolve/resolv.conf
should do the trick.
How can i archive this?
Information about the environment:
┬─[xxx@Archie:~/D/k/e/t/hetzner] (21:xx:07)
╰─>$ kubeone version
{
"kubeone": {
"major": "1",
"minor": "3",
"gitVersion": "v1.3.0",
"gitCommit": "bfe6683334acdbb1a1d9cbbb2d5d5095f6f0111e",
"gitTreeState": "",
"buildDate": "2021-09-15T22:12:12+00:00",
"goVersion": "go1.17.1",
"compiler": "gc",
"platform": "linux/amd64"
},
"machine_controller": {
"major": "1",
"minor": "35",
"gitVersion": "v1.35.2",
"gitCommit": "",
"gitTreeState": "",
"buildDate": "",
"goVersion": "",
"compiler": "",
"platform": "linux/amd64"
}
}
OS: Arch Linux Kubernetes: 1.22.5 Provider: Hetzner Cloud
According to that documentation, kubeadm
(which is used by kubeone) should automatically detect and set the resolv-conf
flag if necessary. I'm also not entirely sure how it would solve the problem here. What component is reporting that error message you've shared? How many DNS servers are distributed via your DHCP / networking settings?
The error pops up in every mandatory main pod who requires DNS (network pods, DNS pods) I have just created a blank cluster with kubeone (nothing deployed on top). It seems like hetzner changed their DNS setting because of the new "IPv6 only" capability.
Logs:
DNS Settings from one of the masters:
cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
cat /run/systemd/resolve/resolv.conf
nameserver 2a01:4ff:ff00::add:1
nameserver 2a01:4ff:ff00::add:2
nameserver 185.12.64.2
# Too many DNS servers configured, the following entries may be ignored.
nameserver 185.12.64.1
The file itself is reporting the issue,... lol I did also open a hetzner-ticket.
Removing both, the IPv6 Resolver, solved the issue. However, the main goal of kubeone is to provide some kind of automatic installation/configuration. :) Thanks for looking into...
I can confirm this happens. Here's a couple of notes on this:
- This is a built-in limitation of glibc, it's really baked into the Linux ecosystem itself. You usually do not distribute more than three DNS servers for that very reason.
- This is not an error or KubeOne bug, it's a warning. DNS still works on KubeOne clusters set up with Hetzner, but it seems to rely on a single IPv4 since the IPv6 DNS servers do not seem accessible from within a Pod (this was just a quick check from a random pod; no guarantees this applies to CoreDNS, but I expect it to). That is not ideal, but nothing is "broken".
- Hetzner does not seem to provide configuration interfaces to change what DNS is distributed in their networks.
- KubeOne does not manage/optimize/configure the underlying infrastructure for the control plane. Also see https://docs.kubermatic.com/kubeone/v1.3/architecture/requirements/infrastructure_management/#infrastructure-for-control-plane. The Terraform examples (including for Hetzner) are just examples and not considered production-ready. While it's not spelled out, I think it's fair to say that "valid DNS configuration" is a common expectation here, and KubeOne cannot work around such idiosyncrasies.
To conclude: For now, you'll need to change your infrastructure provisioning (by adding some scripts to your Terraform provisioning, for example) if you want to get rid of that. Getting in touch with Hetzner and asking them to provide some configurability here is a good move as well.
I'll keep this ticket open as I could see a potential feature for providing a separate resolv.conf
to kubelets as a generic way to solve such problems, but I'd have to check internally if this is a feature we want to add.
Thanks for clarification. I`m in contact with hetzner, lets see what comes out there.
I hope it will be fixed soon on the Hetzer side. But before that, you may use the following cloud-init user data configuration, which overrides the default resolv.conf file.
#cloud-config
write_files:
- content: |
nameserver 185.12.64.2
nameserver 185.12.64.1
path: /run/systemd/resolve/resolv.conf
https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/server#argument-reference
@randrusiak, do you know where to define this for kubermatic (kkp) created clusters. So with an kubeone installation it might be possible.
Unfortunately, I don't have any experience with the kubermatic (kkp). By the way, the above example isn't enough because after restarting systemd-resolved service configuration of resolv.conf is restored to default values.
@Berndinox Any news from Hetzner? Experiencing same behavior with ipv6 + constantly crashing pods hcloud-csi-controller-0 and hcloud-csi-node-xxxxx Is there any way to modify cloud-init on machinecontroller to fully disable ipv6 support?
Seems like they wont fix it… :(
In theory, one could add custom cloud-init userdata with /run/systemd/resolve/resolv.conf
to terraform config.
In practice, it doesn't resolve this issue because systemd-resolved also gets nameservers from network interfaces. So basically you need to disable or remove nameserver configuration from network devices. I will try to get deeper at the weekend and I let you know if I find a solution :)
Apologies to bump the post, but I've spent the best part of a day trying to fix this... My logs are polluted by the same Nameserver limits message as OP. Does anyone have a solution to this?
@EarthlingDavey can you please check what kubelet flags you have on both, control-plane and worker nodes?
I havent specified any kubelet flags. I can reproduce with:
- Copy the example files
- kubeone.yaml
apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster versions: kubernetes: 1.23.7 clusterNetwork: cni: canal: mtu: 1400 # Hetzner specific 1450 bytes - 50 VXLAN bytes cloudProvider: hetzner: {} external: true
-
terraform init
-
terraform apply --auto-approve && terraform output -json > tf.json
-
kubeone apply --manifest kubeone.yaml --tfjson tf.json -v -y
-
kubectl get events --sort-by='{.lastTimestamp}' -A
Result is
kube-system 22s Warning DNSConfigForming pod/canal-t4rgs Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 21s Warning DNSConfigForming pod/node-local-dns-p9zr2 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 16s Warning DNSConfigForming pod/etcd-example-control-plane-3 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 12s Warning DNSConfigForming pod/coredns-74cfdf5c5d-w2gsl Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1 kube-system 12s Warning DNSConfigForming pod/etcd-example-control-plane-1 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1 kube-system 10s Warning DNSConfigForming pod/canal-9tntj Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1 kube-system 5s Warning DNSConfigForming pod/kube-proxy-r4c99 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 4s Warning DNSConfigForming pod/kube-scheduler-example-control-plane-2 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 4s Warning DNSConfigForming pod/etcd-example-control-plane-2 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 3s Warning DNSConfigForming pod/kube-apiserver-example-control-plane-3 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1 kube-system 1s Warning DNSConfigForming pod/canal-rgphc Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
I can check anything else. Edit: I'm in https://kubermatic-community.slack.com/ and available for quick responses in case that helps.
Yeah, I can reproduce it too
Do you know, is there a "kubeone / machine-controller way" to set kubelet flags?
The only config I can find for kubelet are in the kubeone api reference docs, but they are limited to systemReserved, kubeReserved, evictionHard and maxPods.
On control-plane and static worker nodes kubeadm sets resolvConf: /run/systemd/resolve/resolv.conf
in /var/lib/kubelet/config.yaml
.
On dynamic workers MC writes:
- path: "/etc/systemd/system/kubelet.service.d/extras.conf"
content: |
[Service]
Environment="KUBELET_EXTRA_ARGS=--resolv-conf=/run/systemd/resolve/resolv.conf"
unconditionally on ubuntu nodes.
That's interesting and helpful to coming to a solution.
But as pointed out in other comments, an edit to /run/systemd/resolve/resolv.conf is not permenant as long as systemd-resolved is active.
I got as far as adding this to control-planes
user_data = <<EOT
#cloud-config
write_files:
- content: |
nameserver 185.12.64.2
nameserver 185.12.64.1
nameserver 2a01:4ff:ff00::add:1
path: /etc/resolv-limit-3.conf
EOT
My next step would be to:
- Try and update the ponter in
/var/lib/kubelet/config.yaml
- Work out how to write a resolv-limit-3.conf file on dynamic workers.
- In MC set KUBELET_EXTRA_ARGS to point to new resolv-limit-3.conf
OR
- Discable systemd-resolved on control-planes and workers.
- Overwrite /run/systemd/resolve/resolv.conf and not be concerned that systemd-resolved will revert these changes.
Do you have any futher suggestion? I'm no sys-admin, more like full stack web.
@EarthlingDavey BTW, besides warnings spam in the logs, what other issues do you observe?
AFAICT it is just logs spam. Frustrating and concerning,because I am getting about 1 every 10 secs on a fresh install.
Pods will likely 10 x whan I add apps etc. to the cluster. So I expect > 1 warning / second then.
Edit: thanks for the nudge @kron4eg ... I can get by with filtering the DNSConfigForming warning events with:
kubectl get events --field-selector type!=Warning,reason!=DNSConfigFormings -A
I have the same issue. I stumbled upon this thread after searching why my kubeone cluster on Hetzner Cloud is not resolving in-cluster DNS properly.
I have some pods that need to connect to a database. Therefore I am using in-cluster DNS to provide the hostname (e.g. foo-postgres.default.svc.cluster.local
).
What is really weird is, that none of the pods can resolve this name. Unless when I install "dig" (.e.g. from dnsutils), with dig foo-postgres.default.svc.cluster.local
the name actually resolves.
@lukas-at-harren I think you probably have this issue, plus another DNS issue.
Because, even though I get excessive logs like reported here... my in-cluster and WAN DNS is resolving ok.
same here. Ran into the same issue. Minor annoyance. Would be nice to get rid off. Has anyone tried if unbound could be a elegant workaround for this?
no, unbound wouldn't change anything as main generated /etc/resolv.conf contains too much of records
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen