AdGuardHome icon indicating copy to clipboard operation
AdGuardHome copied to clipboard

PTR lookup returns NXDomain if upstream DNS answers to fast

Open TheB1gG opened this issue 1 year ago • 1 comments

Prerequisites

Platform (OS and CPU architecture)

Linux, AMD64 (aka x86_64)

Installation

Snapcraft

Setup

Local AdGuardHome -> Remote AdGuardHome -> Ubiquity USG-3

AdGuard Home version

v0.107.43

Action

dig -x 192.168.2.86

Expected result

dig -x 192.168.2.86

; <<>> DiG 9.19.19-1-Debian <<>> -x 192.168.2.86
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60835
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;86.2.168.192.in-addr.arpa.     IN      PTR

;; ANSWER SECTION:
86.2.168.192.in-addr.arpa. 0    IN      PTR     Family-Room.main.internal.

Actual result

dig -x 192.168.2.86

; <<>> DiG 9.19.19-1-Debian <<>> -x 192.168.2.86
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 62851
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;86.2.168.192.in-addr.arpa.     IN      PTR

;; AUTHORITY SECTION:
86.2.168.192.in-addr.arpa. 10   IN      SOA     fake-for-negative-caching.adguard.com. hostmaster.86.2.168.192.in-addr.arpa. 100500 1800 900 604800 86400

Additional information and/or screenshots

With this order of packets get the correct PTR from Adguard

22:50:16.739784 enp1s0f0 Out IP 192.168.2.2.34348 > 192.168.2.1.53: 1539+ [1au] PTR? 86.2.168.192.in-addr.arpa. (66)
22:50:16.739784 enp1s0f0 Out IP 192.168.2.2.38960 > 192.168.2.1.53: 1539+ [1au] PTR? 86.2.168.192.in-addr.arpa. (66)
22:50:16.739785 enp1s0f0 Out IP 192.168.2.2.54854 > 192.168.2.1.53: 1539+ [1au] PTR? 86.2.168.192.in-addr.arpa. (66)
22:50:16.740586 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.34348: 1539* 1/0/1 PTR Family-Room.main.internal. (93)
22:50:16.740946 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.38960: 1539* 1/0/1 PTR Family-Room.main.internal. (93)
22:50:16.741365 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.54854: 1539* 1/0/1 PTR Family-Room.main.internal. (93)

With this I get NXDomain

22:50:19.696258 enp1s0f0 Out IP 192.168.2.2.55662 > 192.168.2.1.53: 45518+ PTR? 86.2.168.192.in-addr.arpa. (43)
22:50:19.696880 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.55662: 45518* 1/0/0 PTR Family-Room.main.internal. (82)
22:50:20.736093 enp1s0f0 Out IP 192.168.2.2.34214 > 192.168.2.1.53: 51347+ [1au] PTR? 86.2.168.192.in-addr.arpa. (66)
22:50:20.736093 enp1s0f0 Out IP 192.168.2.2.37794 > 192.168.2.1.53: 51347+ [1au] PTR? 86.2.168.192.in-addr.arpa. (66)
22:50:20.736837 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.34214: 51347* 1/0/1 PTR Family-Room.main.internal. (93)
22:50:20.737249 enp1s0f0 In  IP 192.168.2.1.53 > 192.168.2.2.37794: 51347* 1/0/1 PTR Family-Room.main.internal. (93)

Logfile: https://gist.github.com/TheB1gG/a1df1e733ab3cacfb37ac61140fbe1b3

TheB1gG avatar Jan 30 '24 16:01 TheB1gG

Can't you reproduce it or why didn't get this issue any labels?

TheB1gG avatar Feb 11 '24 21:02 TheB1gG

Sorry for the late response.

2024/01/29 22:13:21.444936 3180371#48 [debug] dnsforward: recursion detected resolving "86.2.168.192.in-addr.arpa."

This line shows you what could be wrong. It seems like your configuration of AdGuard Home is causing it to query PTRs from itself. To prevent this, inspect your configuration and set the upstreams for PTRs, including those from locally-served networks, explicitly.

ainar-g avatar Feb 15 '24 18:02 ainar-g

~~Thank you for the response @ainar-g You pointed me in the right direction. I tested with debian on wsl2 and that fires every dns query 3 times for some unknown reason to me. So maybe you could relax your recursion detection a littlebit or convince debian or microsoft to only query ones while running on wsl2?~~

TheB1gG avatar Feb 16 '24 13:02 TheB1gG

Hi @ainar-g I simply couldn't find a way to get it working. PTR in Windows tracert does not work too because of this. The behaiviour of tracert is like this image or with IPv6 image Did you ever test your recursion implementation with reverse DNS servers that have a latency of over 50 ms and realworld applications like tracert or traceroute? Because your recursion detection triggers there before the responsible remote dns server for that IP range can answer. Is there a option to disable the recursion stuff in the meantime?

Since I don't know if replies to closed issues are read I will open a new Issue in a week with the updated information if I don't get any reply here.

Thank you

TheB1gG avatar Feb 17 '24 22:02 TheB1gG

@TheB1gG, I'm sorry, but I am not sure what you're asking about here. If the network configuration on the machine allows sending PTR queries to itself, we do not consider it a valid configuration, since it'd just end up in infinite loops of queries. There isn't an option to disable this check, and you should inspect and fix the configuration that allows this to occur in the first place. And, unless I'm mistaken, latency shouldn't have anything to do with this, as the recursion detection logic is only based on the message ID, type, and target.

If you need help figuring out how to prevent this, you can ask around in the Discussions.

ainar-g avatar Feb 21 '24 12:02 ainar-g

@ainar-g the configuration does not allow to send querys to itself, as you can see in the screenshot in https://github.com/AdguardTeam/AdGuardHome/issues/6691#issuecomment-1950492874 the client does send query 1 and then retries before query 1 was answered and now query 2 gets directly answered with nxdomain because the recursion detections triggers and then shortly after the answer from upstream is there and query 1 gets answered but the client will ignore it because of the nxdomain from before. If the latency is less than the retry timeout (which seems to be 40 ms for microsoft) everything works fine. It fails only if the retry happens while adguardhome still waits for the response of the upstream. Just to make sure you understand me correctly, at no time does adguard query itself. If that can't be read from my log that I attached earlier, please tell me what part of the logs or config you need so you can understand that there is no recursion.

TheB1gG avatar Feb 22 '24 15:02 TheB1gG

The only other thing that could cause this is if the software in question is reusing message IDs, because, as mentioned, the logic is based on message ID, resource type, and the target. And that can potentially cause all sorts of issues with all sort of DNS servers, so if you have access to the software, I'd recommend using a randomized ID.

ainar-g avatar Feb 22 '24 16:02 ainar-g

it looks like tracert does reuse the message id. Wireshark does even warn about it (wireshark did run on the querying client) image Software wise we speak about the normal tracert programm that ships with every windows. I don't think that the solution can be to replace standard windows tools for this. If Adguardhome wouldn't answer at all on the query it thinks that is a recursion it would work, but that nxdomain does confuse the tools like traceroute and mtr.

TheB1gG avatar Feb 23 '24 01:02 TheB1gG

@ainar-g could you please reopen this and give it the label bug. After reading most RFCs regarding DNS it is completely RFC conform for Programms to retry with the same message I'd. The problem is that adguardhomes handling of it is not RFC conform. Thank you

TheB1gG avatar Feb 28 '24 21:02 TheB1gG