FTL
FTL copied to clipboard
Non-FQDN resolving/conditional forwarding doesn't work properly with two search domains configured
Versions
$ pihole -v
Core
Version is v5.18.3-457-ga8d305d5 (Latest: null)
Branch is development-v6
Hash is a8d305d5 (Latest: a8d305d5)
Web
Version is v5.21-929-g085c2880 (Latest: null)
Branch is development-v6
Hash is 085c2880 (Latest: 085c2880)
FTL
Version is vDev-e5a24bd (Latest: null)
Branch is development-v6
Hash is e5a24bdd (Latest: e5a24bdd)
Platform
- OS and version: Fedora 40
- Platform: KVM
Expected behavior
When two search domains are configured on a client and more than one conditional forwarder is configured in Pi-Hole, Pi-Hole should respond NXDOMAIN for those domains instead of blocking them as Blocked (external, NXRA) and responding 0.0.0.0 / ::. Not responding with NXDOMAIN will result in the client not attempting to resolve the non-FQDN hostname using the second configured search domain.
Actual behavior / bug
Pihole responds A 0.0.0.0 / AAAA :: to the non-FQDN query for host1:
$ host host1
host1.domain1.lan has address 0.0.0.0
host1.domain1.lan has IPv6 address ::
Therefor the client never tries to resolve host1.domain2.lan, which would trigger conditional forwarding to the other internal DNS server that has a valid entry for host1.domain2.lan that satisfies this request.
Steps to reproduce
Scenario:
- Two networks with separate domains and resolvers exist:
- domain1.lan with 10.0.0.0/24 and resolver 10.0.0.10
- domain2.lan with 192.168.0.0/24 and resolver 192.168.0.10
- a Pi-Hole instance is running at 10.0.0.40
The client has the following DNS settings:
$ resolvectl
Link 1 (wlp2s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.0.40
DNS Servers: 10.0.0.40
DNS Domain: domain1.lan domain2.lan
Configuration for Pi-Hole under Settings -> DNS -> Conditional Forwarding
true,10.0.0.0/24,10.0.0.10,domain1.lan
true,192.168.0.0/24,192.168.0.10,domain2.lan
Steps to reproduce the behavior:
- User tries to query the non-FQDN host
host1 - The client expands this to
host1.domain1.landue to the search domain setting ofdomain1.lan domain2.lan - Pi-Hole receives a query for
host1.domain1.lan - The first DNS server (
10.0.0.10) that receives this request due to conditional forwarding does not have a valid RRSet for this domain - Pi-Hole receives the
NXDOMAINfrom10.0.0.10and decides to block the request as it doesn't allow this request to be forwarded to the internet - The client receives a
A 0.0.0.0/AAAA ::response from Pi-Hole and is satisfied. Had it received anNXDOMAINresponse it would have tried queryinghost1.domain2.lan, which would have yielded the desired response.
Debug Token
- URL: https://tricorder.pi-hole.net/2tmjFPFg/
This issue is stale because it has been open 30 days with no activity. Please comment or update this issue or it will be closed in 5 days.
Do you see the same behaviour if you set the blocking mode to NXDOMAIN rather than NULL?
When I change the blocking mode to NXDOMAIN the behaviour changes to working as intended:
- I issue
host host1on my client - Client automatically appends first configured search domain
- Client queries PiHole with
host1.domain1.lan - PiHole sends an NXDOMAIN for
host1.domain1.lan - Client retries with
host1.domain2.lan(the second configured search domain) - PiHole forwards this to the DNS server configured in conditional forwarding for this domain
- Client receives a correct result for
host1.domain2.lanfrom the forwarded server via PiHole
Thanks for the update.
@DL6ER any thoughts here?
@kaechele This is not necessarily a setup I can easily reproduce here but let me start with asking if is this still an issue with the most recent development-v6 ? I recall us having fixed something concerning the detection of the external blocked status a few weeks ago, this may have coincided with your issue ticket which I unfortunately missed myself. I will move this to the right repository.
If it still exists with your previous configuration (which may be the case), please run
sudo pihole-FTL --config debug.queries true
and try again the host host1 on your client. The related content in /var/log/pihole/FTL.log should give us a better picture of what is going on here (and hopefully why FTL seems to have detected an upstream blocking attempt with NXRA).
I'm pretty sure the culprit is this: https://github.com/pi-hole/FTL/blob/61a211f1c187206f5ff901afae657968114fde15/src/dnsmasq_interface.c#L2617-L2626
Context
I reverted dns.blocking.mode back to NULL (the default) and set debug.queries to true to capture the following log:
Query Log for host1 (non-FQDN)
2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[A] query "host1.domain1.lan" from eth0/10.0.0.151#58470 (ID 9977176, FTL 84021, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977176, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977176, FTL 84021, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977176, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: Adding RR: "host1.domain1.lan A 0.0.0.0"
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is 0.0.0.0 (ID 9977176, src/dnsmasq_interface.c:404)
2024-10-17 02:33:13.778 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[AAAA] query "host1.domain1.lan" from eth0/10.0.0.151#45799 (ID 9977177, FTL 84022, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.780 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977177, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977177, FTL 84022, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977177, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: Adding RR: "host1.domain1.lan AAAA ::"
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is :: (ID 9977177, src/dnsmasq_interface.c:439)
2024-10-17 02:33:13.787 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[MX] query "host1.domain1.lan" from eth0/10.0.0.151#44066 (ID 9977178, FTL 84023, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.789 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977178, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.790 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977178, FTL 84023, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977178, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is (NODATA) (ID 9977178, src/dnsmasq_interface.c:457)
My read of what's happening here:
10.0.0.10, the upstream server set in the conditional forwarding fordomain1.lan, is a PowerDNS Authoritative DNS server.- It receives DDNS updates from the Kea DHCP server responsible for the LAN with the domain
domain1.lan, so that it is able to respond to queries for bothdomain1.lanAqueries as well as for0.0.10.in-addr.arpaPTRqueries. - The PowerDNS Authoritative server does not do recursion, hence the unset
RAbit. - Pi-Hole interprets an
NXDOMAINwith unsetRAbit as the name being blocked upstream, is satisfied with that and considers the domain blocked for itself as well. - However (and this gets lost here), the PowerDNS server set the
AAbit because it is authoritative fordomain1.lan. It knows for sure this name doesn't exist.
In this case the upstream server behaves correctly, because it doesn't have an entry for this host but it also cannot do recursion. It also doesn't need to, because it is authoritative for domain1.lan.
This is what the query looks like towards 10.0.0.10 using dig:
; <<>> DiG 9.18.28 <<>> host1.domain1.lan @10.0.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50978
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;host1.domain1.lan. IN A
;; AUTHORITY SECTION:
domain1.lan. 3600 IN SOA dns.domain1.lan. hostmaster.domain1.lan. 2024101608 10800 3600 604800 3600
;; Query time: 22 msec
;; SERVER: 10.0.0.10#53(10.0.0.10) (UDP)
;; WHEN: Wed Oct 16 22:53:51 EDT 2024
;; MSG SIZE rcvd: 111
I believe the root cause here is that PiHole needs to only consider a domain blocked upstream if both the RA and the AA bit are not set. If the AA bit is set PiHole should treat any NXDOMAIN response as authoritatively non-existent rather than blocked.
For comparison, here is a response from 9.9.9.9 for a known Malware domain that this server blocks:
; <<>> DiG 9.18.28 <<>> 1312services.ru @9.9.9.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29040
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;1312services.ru. IN A
;; Query time: 29 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Wed Oct 16 23:00:36 EDT 2024
;; MSG SIZE rcvd: 44
No RA bit but also no AA bit. It's probably fine to continue considering this type of response as "blocked externally".
Thank you, this is about what I was assuming. Also thank you very much for the proposed fix already :-)
I will review/verify this after returning from work today (it's still earlyish morning on this side of the planet)
Bugfix has been merged