talos icon indicating copy to clipboard operation
talos copied to clipboard

1.7.1 with hostdns and forwardKubeDNSToHost doesn't resolve anything

Open evanrich opened this issue 1 year ago • 7 comments

Bug Report

Description

This is on a cluster that has been upgraded (1.6.x->1.7.x), not fresh

after applying the following patch:

machine:
  features:
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: true

nothing seems to resolve dns, either in the cluster or externally

getupstream:

talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 get dnsupstream
NODE           NAMESPACE   TYPE          ID            VERSION   HEALTHY   ADDRESS
192.168.5.10   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.11   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.12   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53
192.168.5.15   network     DNSUpstream   192.168.5.1   1         true      192.168.5.1:53

resolv.conf

 talosctl -n 192.168.5.10 read /system/resolved/resolv.conf
nameserver 10.96.0.9
talosctl -n 192.168.5.10 read /etc/resolv.conf
nameserver 127.0.0.53

resolvers

 talosctl -n 192.168.5.10 get resolvers
NODE           NAMESPACE   TYPE             ID          VERSION   RESOLVERS
192.168.5.10   network     ResolverStatus   resolvers   2         ["192.168.5.1"]

CoreDNS was restarted twice after applying the patch.

Logs

[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.2.33:48571->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37010 - 44799 "AAAA IN radarr.media.svc. udp 34 false 512" - - 0 2.001171487s
[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.0.228:41133->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37010 - 44353 "A IN radarr.media.svc. udp 34 false 512" - - 0 2.001187098s
[ERROR] plugin/errors: 2 radarr.media.svc. A: read udp 10.244.0.228:49164->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.30:55153 - 65462 "AAAA IN sonarr.media.svc. udp 34 false 512" - - 0 2.001133409s
[INFO] 10.244.0.30:55153 - 65275 "A IN sonarr.media.svc. udp 34 false 512" - - 0 2.001014136s
[ERROR] plugin/errors: 2 sonarr.media.svc. A: read udp 10.244.2.33:38186->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 sonarr.media.svc. AAAA: read udp 10.244.2.33:57661->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.16:57244 - 50161 "AAAA IN api.allegion.yonomi.cloud. udp 43 false 512" - - 0 2.001230715s
[ERROR] plugin/errors: 2 api.allegion.yonomi.cloud. AAAA: read udp 10.244.0.228:33430->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.16:57244 - 49553 "A IN api.allegion.yonomi.cloud. udp 43 false 512" - - 0 2.001237302s
[ERROR] plugin/errors: 2 api.allegion.yonomi.cloud. A: read udp 10.244.0.228:47070->10.96.0.9:53: i/o timeout
[INFO] 10.244.1.242:47829 - 1031 "AAAA IN api.doppler.com. udp 44 false 1232" - - 0 2.001031405s
[INFO] 10.244.1.242:50138 - 44842 "A IN api.doppler.com. udp 44 false 1232" - - 0 2.001066446s
[ERROR] plugin/errors: 2 api.doppler.com. AAAA: read udp 10.244.0.228:52401->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 api.doppler.com. A: read udp 10.244.0.228:48637->10.96.0.9:53: i/o timeout

Environment

  • Talos version: 1.7.1
  • Kubernetes version: 1.30.0
  • Platform: dell 5060/7060 nodes

Reverting the patch (false/false) fixes dns again.

FWIW, here's my coredns configmap:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    log . {
        class error
    }
    prometheus :9153

    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

evanrich avatar May 03 '24 07:05 evanrich

coredns graphs go through the roof as well image

evanrich avatar May 03 '24 20:05 evanrich

Greetings! Can you provide talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 logs dns-resolve-cache output?

DmitriyMV avatar May 05 '24 17:05 DmitriyMV

Greetings! Can you provide talosctl -n 192.168.5.10,192.168.5.11,192.168.5.12,192.168.5.15 logs dns-resolve-cache output?

sure! with

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: false

i get ~13k lines, here's the last few:

192.168.5.12: 2024-05-05T19:15:00.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 27405\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 27405\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:20.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 30173\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 30173\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:40.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26124\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:15:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26124\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:00.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44019\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44019\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:20.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26814\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26814\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:40.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44389\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:16:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 44389\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:00.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59770\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:00.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59770\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:20.324Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20152\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:20.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20152\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:40.325Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 43480\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.12: 2024-05-05T19:17:40.326Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 43480\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}

with

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: true

I get

192.168.5.10: 2024-05-05T19:20:52.012Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 13650\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:20:52.012Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 45522\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:20:52.013Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NXDOMAIN, id: 13650\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t A\n\n;; AUTHORITY SECTION:\n.\t1800\tIN\tSOA\ta.root-servers.net. nstld.verisign-grs.com. 2024050501 1800 900 604800 86400\n"}
192.168.5.10: 2024-05-05T19:20:52.013Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NXDOMAIN, id: 45522\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.media.svc.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\n.\t1800\tIN\tSOA\ta.root-servers.net. nstld.verisign-grs.com. 2024050501 1800 900 604800 86400\n"}
192.168.5.10: 2024-05-05T19:21:01.928Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61250\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:01.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61250\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.682Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61066\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.682Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37810\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59884\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 35319\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 61066\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\ndomain.io.\t1710\tIN\tSOA\trose.ns.cloudflare.com. dns.cloudflare.com. 2340201800 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:02.683Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37810\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;radarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nradarr.domain.io.\t270\tIN\tA\t10.10.5.30\n"}
192.168.5.10: 2024-05-05T19:21:02.686Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 35319\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nsonarr.domain.io.\t5\tIN\tA\t10.10.5.30\n"}
192.168.5.10: 2024-05-05T19:21:02.686Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 59884\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\ndomain.io.\t1710\tIN\tSOA\trose.ns.cloudflare.com. dns.cloudflare.com. 2340201800 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:12.216Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37590\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;.\tIN\t NS\n"}
192.168.5.10: 2024-05-05T19:21:12.217Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37590\n;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;.\tIN\t NS\n\n;; ANSWER SECTION:\n.\t3600\tIN\tNS\ta.root-servers.net.\n.\t3600\tIN\tNS\tb.root-servers.net.\n.\t3600\tIN\tNS\tc.root-servers.net.\n.\t3600\tIN\tNS\td.root-servers.net.\n.\t3600\tIN\tNS\te.root-servers.net.\n.\t3600\tIN\tNS\tf.root-servers.net.\n.\t3600\tIN\tNS\tg.root-servers.net.\n.\t3600\tIN\tNS\th.root-servers.net.\n.\t3600\tIN\tNS\ti.root-servers.net.\n.\t3600\tIN\tNS\tj.root-servers.net.\n.\t3600\tIN\tNS\tk.root-servers.net.\n.\t3600\tIN\tNS\tl.root-servers.net.\n.\t3600\tIN\tNS\tm.root-servers.net.\n"}
192.168.5.10: 2024-05-05T19:21:19.651Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20589\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:19.651Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26931\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:19.652Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 20589\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\ns3.domain.io.\t296\tIN\tA\t104.21.30.117\ns3.domain.io.\t296\tIN\tA\t172.67.172.226\n"}
192.168.5.10: 2024-05-05T19:21:19.652Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 26931\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;s3.domain.io.\tIN\t AAAA\n\n;; ANSWER SECTION:\ns3.domain.io.\t296\tIN\tAAAA\t2606:4700:3035::6815:1e75\ns3.domain.io.\t296\tIN\tAAAA\t2606:4700:3037::ac43:ace2\n"}
192.168.5.10: 2024-05-05T19:21:21.928Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37418\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:21.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 37418\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:30.483Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 41265\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:30.483Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 3690\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:30.484Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 41265\n;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t A\n\n;; ANSWER SECTION:\nplex.tv.\t30\tIN\tA\t34.243.94.189\nplex.tv.\t30\tIN\tA\t34.241.88.179\n"}
192.168.5.10: 2024-05-05T19:21:30.484Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 3690\n;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;plex.tv.\tIN\t AAAA\n\n;; AUTHORITY SECTION:\nplex.tv.\t207\tIN\tSOA\tjeremy.ns.cloudflare.com. dns.cloudflare.com. 2340420772 10000 2400 604800 1800\n"}
192.168.5.10: 2024-05-05T19:21:41.927Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 38917\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:41.928Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 38917\n;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;k8s.lab.domain.io.\tIN\t AAAA\n"}
192.168.5.10: 2024-05-05T19:21:45.777Z DEBUG dns request {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 29216\n;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n"}
192.168.5.10: 2024-05-05T19:21:45.778Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 29216\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0\n\n;; QUESTION SECTION:\n;sonarr.domain.io.\tIN\t A\n\n;; ANSWER SECTION:\nsonarr.domain.io.\t257\tIN\tA\t10.10.5.30\n"}

As soon as the patch is applied and coredns restarted, I start immediately seeing issues, for example in my homeassistant logs:

 (SyncWorker_14) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:21:37.684 WARNING (SyncWorker_3) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available
2024-05-05 12:22:07.685 WARNING (SyncWorker_4) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:22:07.687 WARNING (SyncWorker_50) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available
2024-05-05 12:22:37.690 WARNING (SyncWorker_46) [custom_components.radarr_upcoming_media.sensor] Host radarr.domain.io is not available
2024-05-05 12:22:37.693 WARNING (SyncWorker_15) [custom_components.sonarr_upcoming_media.sensor] Host sonarr.domain.io is not available

and from the coredns deployment logs itself:

ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:42921->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:37176 - 46031 "A IN radarr.media.svc. udp 34 false 512" - - 0 2.000961528s
[INFO] 10.244.0.142:37176 - 46771 "AAAA IN radarr.media.svc. udp 34 false 512" - - 0 2.000981288s
[ERROR] plugin/errors: 2 radarr.media.svc. AAAA: read udp 10.244.0.58:39689->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 radarr.media.svc. A: read udp 10.244.0.58:56654->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:34045 - 18857 "AAAA IN sonarr.media.svc. udp 34 false 512" - - 0 2.0010946020000002s
[ERROR] plugin/errors: 2 sonarr.media.svc. AAAA: read udp 10.244.3.183:57222->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.142:34045 - 18443 "A IN sonarr.media.svc. udp 34 false 512" - - 0 2.001069037s
[ERROR] plugin/errors: 2 sonarr.media.svc. A: read udp 10.244.3.183:60187->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.171:33221 - 59865 "A IN s3.domain.io. udp 33 false 512" - - 0 2.001200777s
[INFO] 10.244.0.171:33221 - 17887 "AAAA IN s3.domain.io. udp 33 false 512" - - 0 2.001220341s
[ERROR] plugin/errors: 2 s3.domain.io. A: read udp 10.244.3.183:42636->10.96.0.9:53: i/o timeout
[ERROR] plugin/errors: 2 s3.domain.io. AAAA: read udp 10.244.3.183:51402->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38535 "AAAA IN ps.pndsn.com. udp 30 false 512" - - 0 2.00101046s
[ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:46826->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38373 "A IN ps.pndsn.com. udp 30 false 512" - - 0 2.001172459s
[ERROR] plugin/errors: 2 ps.pndsn.com. A: read udp 10.244.3.183:49263->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.213:33246 - 976 "A IN github.com. udp 39 false 1232" - - 0 2.00063978s
[ERROR] plugin/errors: 2 github.com. A: read udp 10.244.3.183:57869->10.96.0.9:53: i/o timeout
[INFO] 10.244.0.213:41944 - 26300 "AAAA IN github.com. udp 39 false 1232" - - 0 2.001602828s
[ERROR] plugin/errors: 2 github.com. AAAA: read udp 10.244.0.58:52988->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38535 "AAAA IN ps.pndsn.com. udp 30 false 512" - - 0 2.000211697s
[ERROR] plugin/errors: 2 ps.pndsn.com. AAAA: read udp 10.244.3.183:48895->10.96.0.9:53: i/o timeout
[INFO] 10.244.3.9:44995 - 38373 "A IN ps.pndsn.com. udp 30 false 512" - - 0 2.000241596s
[ERROR] plugin/errors: 2 ps.pndsn.com. A: read udp 10.244.3.183:41034->10.96.0.9:53: i/o timeout

changing forwardKubeDNSToHost: true back to false brings things back to normal. I can post my machine config if that helps but don't have anything too crazy there. upon restarting the coredns deployment, the logs are clean again:

.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2

evanrich avatar May 05 '24 19:05 evanrich

Same issue for me, but unfortunately disabling hostDNS features doesn't resolve the issue.

I am using my own DNS servers, however using public DNS servers didn't help.

It worked fine using version 1.6.7, failed to work from 1.7.0, keeps failing in 1.7.1.

chrxmvtik avatar May 05 '24 20:05 chrxmvtik

It worked fine using version 1.6.7, failed to work from 1.7.0, keeps failing in 1.7.1.

Let's not mix different issues in one ticket please.

smira avatar May 07 '24 14:05 smira

@evanrich what is the CNI you're using?

smira avatar May 07 '24 14:05 smira

@evanrich what is the CNI you're using?

Cilium v1.15.4

evanrich avatar May 07 '24 17:05 evanrich

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4.

I am using DHCP-discovered public DNS servers run by Hetzner.

Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

MathiasPius avatar May 14 '24 12:05 MathiasPius

I have the same error. I'm using talos v1.7.1 and cilium v1.14.7

pau-campana avatar May 14 '24 12:05 pau-campana

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4.

I am using DHCP-discovered public DNS servers run by Hetzner.

Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

Check if you are using bpf.masquerade if yes and you did not specify CIDRs manually, then with common private CIDRs you will get above error.

Try to set bpf.masquerade option to false and check if that works.

chrxmvtik avatar May 14 '24 12:05 chrxmvtik

I'm seeing the same problem on Talos 1.7.1 (also upgraded from earlier versions), Kubernetes 1.29.1, Cilium 1.15.4. I am using DHCP-discovered public DNS servers run by Hetzner. Hubble (Cilium packet inspection) reports that the UDP requests from CoreDNS to the Talos DNS service IP (10.96.0.9 in my case) are delivered, but the response packets from 10.96.0.9 to CoreDNS pod are dropped with the reason TTL Exceeded.

Check if you are using bpf.masquerade if yes and you did not specify CIDRs manually, then with common private CIDRs you will get above error.

Try to set bpf.masquerade option to false and check if that works.

Sounds very plausible. However, bpf masquerade is disabled for my use case, but I can see that iptables masquerade for ipv4 is enabled. I would assume disabling this would have the same effect?

Edit: I disabled all masquerading:

$ kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep Masquerading
Masquerading:            Disabled

~~But I'm still seeing the exact same issue.~~ I am now seeing the issue with the public IP address of the DNS Server instead.

It seems to me that masquerading is a very likely culprit, but I'm not sure how exactly yet. Will keep digging.

MathiasPius avatar May 14 '24 12:05 MathiasPius

The fix is coming, thanks for reporting it, it's indeed the TTL. It's only related to fowardKubeDNSToHost option which is not enabled by default in Talos 1.7 (only enabled for Docker-based clusters).

smira avatar May 15 '24 13:05 smira

Reopened until there is 1.7 backport.

DmitriyMV avatar May 15 '24 17:05 DmitriyMV

Closed per #8758

DmitriyMV avatar May 17 '24 18:05 DmitriyMV

@DmitriyMV not sure if this is related, but after upgrading 1.7.1->1.7.2, while better, I now see other errors:

.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
.:53
[INFO] plugin/reload: Running configuration SHA512 = f43368fe881b6cd37b121f37ba0b71c065df5bfc99b5c5c05d7f95bf82289d7ab7e78d5b98c1f02172d8004a8a8f34027cef04e86c780d40a7c5d1301559f5b3
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[INFO] 10.244.0.100:34471 - 62223 "AAAA IN registry.npmjs.org. udp 36 false 512" - - 0 5.00010545s
[ERROR] plugin/errors: 2 registry.npmjs.org. AAAA: dns: buffer size too small
[INFO] 10.244.0.23:42181 - 51958 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000088228s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000083312s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55229 "A IN api.ring.com. udp 30 false 512" - - 0 5.000134008s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55229 "A IN api.ring.com. udp 30 false 512" - - 0 5.000123943s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000255962s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:45773 - 55470 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000144994s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000402892s
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000479894s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000031736s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000118852s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39446 "AAAA IN api.ring.com. udp 30 false 512" - - 0 5.000074868s
[ERROR] plugin/errors: 2 api.ring.com. AAAA: dns: overflowing header size
[INFO] 10.244.0.23:40964 - 39215 "A IN api.ring.com. udp 30 false 512" - - 0 5.000111692s
[ERROR] plugin/errors: 2 api.ring.com. A: dns: overflowing header size

this is based off the following config:

machine:
  features:
    hostDNS:
      enabled: true
      resolveMemberNames: true
      forwardKubeDNSToHost: true

The only thing i flipped from 1.7.1. to 1.7.2 was the forward to host.

evanrich avatar May 17 '24 22:05 evanrich

1.7.3 fixes the errors above

evanrich avatar May 29 '24 23:05 evanrich