[Question] "Waiting for resolver" - what can I do?
Occasionally I will get "NXDOMAIN" responses in my browser, for domains I'm quite sure works, and sure enough after a little while the browser loads the page (I guess edge is doing reconnects in the background).
If I'm quick, and I do a DNS query myself, I get something like below. Is there anything I can do to reduce the likelihood of this? Increase threads? Buffers? Thread counts? Cache time? Reduce timeouts & Increase retry counters?
I suspect the actual nameservers behind this domain are having issues, and that really there is nothing to do - but I'd ask it anyways :)
$ dig hardspaceshipbreaker.fandom.com
; <<>> DiG 9.16.1-Ubuntu <<>> hardspaceshipbreaker.fandom.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41662
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; OPT=15: 00 00 57 61 69 74 69 6e 67 20 66 6f 72 20 72 65 73 6f 6c 76 65 72 ("..Waiting for resolver")
;; QUESTION SECTION:
;hardspaceshipbreaker.fandom.com. IN A
;; ANSWER SECTION:
hardspaceshipbreaker.fandom.com. 3592 IN CNAME fandom.com.
;; Query time: 4756 msec
;; SERVER: 192.168.1.2#53(192.168.1.2)
;; WHEN: Mon May 30 18:49:59 CEST 2022
;; MSG SIZE rcvd: 100
Web browsers give a generic error response for DNS related issues so when the error says DNS_PROBE_FINISHED_NXDOMAIN it does not necessarily mean that the domain does not exists. It could be just that the DNS response was for NODATA (domain exists but does not have an answer for the queried type) or that the response RCODE was ServerFailure. So the only way to know the exact issue is to query the DNS server like you already did.
Your query response has "Waiting for resolver" extended DNS error info. This message means that the DNS server does not have data in its cache and is still waiting for the resolver that is working in the background.
This is quite common thing to happen when running a recursive resolver and there is nothing much that can be done about it. The issue is just that the domain's name servers are not responding in time so the DNS resolver has to make retries and then switch to the next name server and query again. But once the response is received, the next time when the record expires in cache and the name servers don't respond in time then Serve Stale feature will help by using the stale data in cache to answer the queries till the resolver updates the cache. So, if you have the DNS server always running then you will see this issue less frequently.
Thanks - I figured as much - that's just life.. Oh well :)
It would in theory be possible to do something like:
- When looking up a domain
- If the nameservers have never been reached before:
- Send a request to all of them
- The first to reply will be used, and then it is recorded that this nameserver was "up" or "best". The other replies will also be tracked
- If the nameservers have been reached before, and we know which are "up" or "best"
- Prioritize the nameserver that last replied
The trick here being: if we don't know, just ask all of them in parallel, but once we do know which servers can reply, ask only that one server..
Ya, I have something of that sort in mind since a lot of other DNS server do "remember" the best set of name servers to try to save time when querying again.
The problem with doing parallel name server queries is that often there is no glue record available for the name servers so, you got to resolve the name server's domain name first and once you get its IP address then you ask it the actual question. But then its also quite common that the name server's domain name also does not come with glue records so you have to go one level deep to get that first. This is why the current recursive algorithm does not do parallel resolutions to avoid too many outbound queries and to avoid the complexity.