kubeone icon indicating copy to clipboard operation
kubeone copied to clipboard

Hetzner DNS Issue (resolv-conf)

Open Berndinox opened this issue 3 years ago • 27 comments

What happened: i followed this guide: https://www.kubermatic.com/blog/kubernetes-on-hetzner-with-kubermatic-kubeone-in-2021/ Everything run through fine.

However, the DNS Servers are not configured properly. Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is:

The Cause is: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues

Adding the flag: --resolv-confwith the value /run/systemd/resolve/resolv.confshould do the trick.

How can i archive this?

Information about the environment:

┬─[xxx@Archie:~/D/k/e/t/hetzner] (21:xx:07)
╰─>$ kubeone version
{
  "kubeone": {
    "major": "1",
    "minor": "3",
    "gitVersion": "v1.3.0",
    "gitCommit": "bfe6683334acdbb1a1d9cbbb2d5d5095f6f0111e",
    "gitTreeState": "",
    "buildDate": "2021-09-15T22:12:12+00:00",
    "goVersion": "go1.17.1",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "35",
    "gitVersion": "v1.35.2",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

OS: Arch Linux Kubernetes: 1.22.5 Provider: Hetzner Cloud

Berndinox avatar Jan 17 '22 20:01 Berndinox

According to that documentation, kubeadm (which is used by kubeone) should automatically detect and set the resolv-conf flag if necessary. I'm also not entirely sure how it would solve the problem here. What component is reporting that error message you've shared? How many DNS servers are distributed via your DHCP / networking settings?

embik avatar Jan 18 '22 08:01 embik

The error pops up in every mandatory main pod who requires DNS (network pods, DNS pods) I have just created a blank cluster with kubeone (nothing deployed on top). It seems like hetzner changed their DNS setting because of the new "IPv6 only" capability.

Logs:

2022-01-18_20-01-13_screenshot 2022-01-18_20-01-40_screenshot

DNS Settings from one of the masters:

cat /etc/resolv.conf

nameserver 127.0.0.53
options edns0 trust-ad

cat /run/systemd/resolve/resolv.conf

nameserver 2a01:4ff:ff00::add:1
nameserver 2a01:4ff:ff00::add:2
nameserver 185.12.64.2
# Too many DNS servers configured, the following entries may be ignored.
nameserver 185.12.64.1

The file itself is reporting the issue,... lol I did also open a hetzner-ticket.

Removing both, the IPv6 Resolver, solved the issue. However, the main goal of kubeone is to provide some kind of automatic installation/configuration. :) Thanks for looking into...

Berndinox avatar Jan 18 '22 18:01 Berndinox

I can confirm this happens. Here's a couple of notes on this:

  • This is a built-in limitation of glibc, it's really baked into the Linux ecosystem itself. You usually do not distribute more than three DNS servers for that very reason.
  • This is not an error or KubeOne bug, it's a warning. DNS still works on KubeOne clusters set up with Hetzner, but it seems to rely on a single IPv4 since the IPv6 DNS servers do not seem accessible from within a Pod (this was just a quick check from a random pod; no guarantees this applies to CoreDNS, but I expect it to). That is not ideal, but nothing is "broken".
  • Hetzner does not seem to provide configuration interfaces to change what DNS is distributed in their networks.
  • KubeOne does not manage/optimize/configure the underlying infrastructure for the control plane. Also see https://docs.kubermatic.com/kubeone/v1.3/architecture/requirements/infrastructure_management/#infrastructure-for-control-plane. The Terraform examples (including for Hetzner) are just examples and not considered production-ready. While it's not spelled out, I think it's fair to say that "valid DNS configuration" is a common expectation here, and KubeOne cannot work around such idiosyncrasies.

To conclude: For now, you'll need to change your infrastructure provisioning (by adding some scripts to your Terraform provisioning, for example) if you want to get rid of that. Getting in touch with Hetzner and asking them to provide some configurability here is a good move as well.

I'll keep this ticket open as I could see a potential feature for providing a separate resolv.conf to kubelets as a generic way to solve such problems, but I'd have to check internally if this is a feature we want to add.

embik avatar Jan 19 '22 08:01 embik

Thanks for clarification. I`m in contact with hetzner, lets see what comes out there.

Berndinox avatar Jan 20 '22 08:01 Berndinox

I hope it will be fixed soon on the Hetzer side. But before that, you may use the following cloud-init user data configuration, which overrides the default resolv.conf file.

#cloud-config
write_files:
- content: |
    nameserver 185.12.64.2
    nameserver 185.12.64.1
  path: /run/systemd/resolve/resolv.conf

https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/server#argument-reference

randrusiak avatar Feb 02 '22 11:02 randrusiak

@randrusiak, do you know where to define this for kubermatic (kkp) created clusters. So with an kubeone installation it might be possible.

gigo1980 avatar Feb 10 '22 15:02 gigo1980

Unfortunately, I don't have any experience with the kubermatic (kkp). By the way, the above example isn't enough because after restarting systemd-resolved service configuration of resolv.conf is restored to default values.

randrusiak avatar Feb 11 '22 18:02 randrusiak

@Berndinox Any news from Hetzner? Experiencing same behavior with ipv6 + constantly crashing pods hcloud-csi-controller-0 and hcloud-csi-node-xxxxx Is there any way to modify cloud-init on machinecontroller to fully disable ipv6 support?

gris-gris avatar Feb 16 '22 14:02 gris-gris

Seems like they wont fix it… :(

Berndinox avatar Feb 16 '22 14:02 Berndinox

In theory, one could add custom cloud-init userdata with /run/systemd/resolve/resolv.conf to terraform config.

kron4eg avatar Feb 18 '22 11:02 kron4eg

In practice, it doesn't resolve this issue because systemd-resolved also gets nameservers from network interfaces. So basically you need to disable or remove nameserver configuration from network devices. I will try to get deeper at the weekend and I let you know if I find a solution :)

randrusiak avatar Feb 18 '22 13:02 randrusiak

Apologies to bump the post, but I've spent the best part of a day trying to fix this... My logs are polluted by the same Nameserver limits message as OP. Does anyone have a solution to this?

EarthlingDavey avatar Jun 08 '22 09:06 EarthlingDavey

@EarthlingDavey can you please check what kubelet flags you have on both, control-plane and worker nodes?

kron4eg avatar Jun 08 '22 11:06 kron4eg

I havent specified any kubelet flags. I can reproduce with:

  1. Copy the example files
  2. kubeone.yaml
    apiVersion: kubeone.k8c.io/v1beta2
    kind: KubeOneCluster
    versions:
      kubernetes: 1.23.7
    clusterNetwork:
      cni:
        canal:
          mtu: 1400 # Hetzner specific 1450 bytes - 50 VXLAN bytes
    cloudProvider:
      hetzner: {}
      external: true
    
  3. terraform init
  4. terraform apply --auto-approve && terraform output -json > tf.json
  5. kubeone apply --manifest kubeone.yaml --tfjson tf.json -v -y
  6. kubectl get events --sort-by='{.lastTimestamp}' -A
Result is
kube-system   22s         Warning   DNSConfigForming          pod/canal-t4rgs                                        Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   21s         Warning   DNSConfigForming          pod/node-local-dns-p9zr2                               Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   16s         Warning   DNSConfigForming          pod/etcd-example-control-plane-3                       Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   12s         Warning   DNSConfigForming          pod/coredns-74cfdf5c5d-w2gsl                           Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1
kube-system   12s         Warning   DNSConfigForming          pod/etcd-example-control-plane-1                       Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1
kube-system   10s         Warning   DNSConfigForming          pod/canal-9tntj                                        Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:1 2a01:4ff:ff00::add:2 185.12.64.1
kube-system   5s          Warning   DNSConfigForming          pod/kube-proxy-r4c99                                   Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   4s          Warning   DNSConfigForming          pod/kube-scheduler-example-control-plane-2             Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   4s          Warning   DNSConfigForming          pod/etcd-example-control-plane-2                       Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   3s          Warning   DNSConfigForming          pod/kube-apiserver-example-control-plane-3             Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1
kube-system   1s          Warning   DNSConfigForming          pod/canal-rgphc                                        Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 185.12.64.1

I can check anything else. Edit: I'm in https://kubermatic-community.slack.com/ and available for quick responses in case that helps.

EarthlingDavey avatar Jun 08 '22 12:06 EarthlingDavey

Yeah, I can reproduce it too

kron4eg avatar Jun 08 '22 13:06 kron4eg

Do you know, is there a "kubeone / machine-controller way" to set kubelet flags?

The only config I can find for kubelet are in the kubeone api reference docs, but they are limited to systemReserved, kubeReserved, evictionHard and maxPods.

EarthlingDavey avatar Jun 08 '22 13:06 EarthlingDavey

On control-plane and static worker nodes kubeadm sets resolvConf: /run/systemd/resolve/resolv.conf in /var/lib/kubelet/config.yaml.

On dynamic workers MC writes:

- path: "/etc/systemd/system/kubelet.service.d/extras.conf"
  content: |
    [Service]
    Environment="KUBELET_EXTRA_ARGS=--resolv-conf=/run/systemd/resolve/resolv.conf"

unconditionally on ubuntu nodes.

kron4eg avatar Jun 08 '22 13:06 kron4eg

That's interesting and helpful to coming to a solution.

But as pointed out in other comments, an edit to /run/systemd/resolve/resolv.conf is not permenant as long as systemd-resolved is active.

I got as far as adding this to control-planes

  user_data = <<EOT
#cloud-config
write_files:
- content: |
    nameserver 185.12.64.2
    nameserver 185.12.64.1
    nameserver 2a01:4ff:ff00::add:1
  path: /etc/resolv-limit-3.conf
EOT

My next step would be to:

  1. Try and update the ponter in /var/lib/kubelet/config.yaml
  2. Work out how to write a resolv-limit-3.conf file on dynamic workers.
  3. In MC set KUBELET_EXTRA_ARGS to point to new resolv-limit-3.conf

OR

  1. Discable systemd-resolved on control-planes and workers.
  2. Overwrite /run/systemd/resolve/resolv.conf and not be concerned that systemd-resolved will revert these changes.

Do you have any futher suggestion? I'm no sys-admin, more like full stack web.

EarthlingDavey avatar Jun 08 '22 13:06 EarthlingDavey

@EarthlingDavey BTW, besides warnings spam in the logs, what other issues do you observe?

kron4eg avatar Jun 08 '22 14:06 kron4eg

AFAICT it is just logs spam. Frustrating and concerning,because I am getting about 1 every 10 secs on a fresh install.

Pods will likely 10 x whan I add apps etc. to the cluster. So I expect > 1 warning / second then.

Edit: thanks for the nudge @kron4eg ... I can get by with filtering the DNSConfigForming warning events with:

kubectl get events --field-selector type!=Warning,reason!=DNSConfigFormings -A

EarthlingDavey avatar Jun 08 '22 14:06 EarthlingDavey

I have the same issue. I stumbled upon this thread after searching why my kubeone cluster on Hetzner Cloud is not resolving in-cluster DNS properly.

I have some pods that need to connect to a database. Therefore I am using in-cluster DNS to provide the hostname (e.g. foo-postgres.default.svc.cluster.local).

What is really weird is, that none of the pods can resolve this name. Unless when I install "dig" (.e.g. from dnsutils), with dig foo-postgres.default.svc.cluster.local the name actually resolves.

lukas-at-harren avatar Jul 05 '22 19:07 lukas-at-harren

@lukas-at-harren I think you probably have this issue, plus another DNS issue.

Because, even though I get excessive logs like reported here... my in-cluster and WAN DNS is resolving ok.

EarthlingDavey avatar Jul 05 '22 19:07 EarthlingDavey

same here. Ran into the same issue. Minor annoyance. Would be nice to get rid off. Has anyone tried if unbound could be a elegant workaround for this?

Citrullin avatar Jul 12 '22 20:07 Citrullin

no, unbound wouldn't change anything as main generated /etc/resolv.conf contains too much of records

kron4eg avatar Jul 12 '22 20:07 kron4eg

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Oct 30 '22 19:10 kubermatic-bot

/remove-lifecycle stale /lifecycle frozen

xmudrii avatar Oct 30 '22 19:10 xmudrii