Bogus validation result of NXDOMAIN detached from DNSSEC zone
-
[X] This is not a support question, I have read about opensource and will send support questions to the IRC channel, GitHub Discussions or the mailing list.
-
[X] I have read and understood the 'out in the open' support policy
-
[X] I have read and understood the PowerDNS AI policy
-
Program: Recursor
-
Issue type: Bug report
Short description
Un-delegated (non-existing) subdomains are not NSEC-validated when parent zone uses different forwarder.
Environment
- Software version: 5.1
- Software source: compiled
Steps to reproduce
dnssec:
validation: process
recursor:
forward_zones_recurse:
- zone: .
forwarders:
- 1.1.1.1
- zone: local
forwarders:
- 1.1.1.1
[doesn't make much sense in this form, but this is the minimal reproducer.] My real use-case is deflecting some queries from misconfigured clients to secondary forwarder with logging and more logic to handle them, sparing the primary one. While it's possible to NTA them, this should be not required as the replies are signed.
Expected behaviour
Response valid with AD bit set. Similar, but not split domain result:
$ dig @127.0.0.1 -p 5346 test
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23126
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 29: (Result synthesized by root-nx-trust)
;; QUESTION SECTION:
;test. IN A
;; AUTHORITY SECTION:
. 3597 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2025112301 1800 900 604800 86400
Actual behaviour
EDE: 12
$ dig @127.0.0.1 -p 5346 local
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 50795
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 12 (NSEC Missing)
;; QUESTION SECTION:
;local. IN A
Other information
Second forwarder returns 2 pairs of NSEC+RRSIG, as expected (from +DO, confirmed with tcpdump), however they are attached to parent (".") not "local." and apparently are not subject of validation.
In test case, since both forwarders point to the same server, this obviously could be validated.
I don't think I can glue these with DS record, as there isn't one, but I'd expect pdns_recursor to follow chain of trust (parent keys) even when forwarded domain itself is non-existing.
The only thing I've found that might be related is https://github.com/PowerDNS/pdns/pull/7238.
BTW the SOA record in response is missing too, even with +noad option.
Can you show a trace? Two ways to do that:
- Start recursor with
--traceon the command line, do the queries and attached the resulting log here. - With a running resolver:
rec_control trace-regex . /tmp/tracefilethen do query and thenrec_control trace-regexto switch tracing of.
Sure: recursor-16540.trace.log
I think the problem here is that we have an entry telling us that 1.1.1.1 is authoritative for local, but when we query it for local/A we get an answer from the . zone, so we discard it because at this point we don't know that 1.1.1.1 is authoritative for . as well, and thus accepting these records would be an issue.
We then do a second query to get the DS for local, retrieving the entry telling us that 1.1.1.1 is authoritative for ., and then we accept the entry telling us that local does not exist but it's too late.
I think that being authoritative for signatures shouldn't matter - after all, they are about to (should) be validated. I.e. routing the queries is distinct from validating the chain of trust. Splitting the domain into separate zones might route the level-by-level queries to different resolves, but as long as they are consistent, this should not break the DNSSEC by cutting the logic into separate scopes. Unless the origin mangles responses deliberately of course (e.g. geofencing).
I mean that it also shouldn't break with local. being forwarded to 8.8.8.8, as long as the signatures returned from deeper level match the keys of upper one. Seems like decision to discard should be postponed...
In my case this is a follow-up of https://github.com/orgs/PowerDNS/discussions/14961 - everything seems to work, until I hit DNSSEC signed parent with NX child. I don't event mind it being stripped of AD, but now I got SERVFAIL and must manually intervene (NTA or adjusting routing).
Seems like decision to discard should be postponed...
That would open the door to denial of service, unfortunately.
I was afraid of that (or some poisoning). I wouldn't dare to suggest adding some special flag for such rare case...
But this is relevant only on defined zones boundaries (only upper boundary actually) - maybe something addTA()-like for glueing them together?
I was indeed wondering: if local is signed and in your management, it might work to add an explicit TA for it.
It's not signed - it still doesn't exist and the proof comes from .. All I want is to route-away such queries to different resolver.
In reality I got dozen zones excluded from . and handled by another recursor. Simply a matter of traffic distribution...
Rationale: . being forwarded to Quad9 and I want to save their service from local, test, example, invalid, internal, onion, alt and localhost reserved domains (among others, like arpa) by redirecting them to a secondary straight recursor.
The front recursor is validating DNSSEC, while the secondary only process-no-validate. I might use dnsdist before to do this job (with other caveats maybe), but the question remains - shouldn't recursor handle itself DNSSEC crossing zone boundaries?
can you set:
recursor:
root_nx_trust: false
and check if this improves things for you?
can you set:
recursor: root_nx_trust: falseand check if this improves things for you?
That doesn't work
Remi is right in this comment. As we don't trust records from . from the local forward, we discard them:
Nov 24 14:31:41 [1] local: Got 7 answers from (empty) (2606:4700:4700::1111), rcode=3 (Non-Existent domain), aa=0, in 9ms
Nov 24 14:31:41 [1] local: Removing record '.|SOA|a.root-servers.net. nstld.verisign-grs.com. 2025112400 1800 900 604800 86400' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: Removing record '.|RRSIG|SOA 8 0 86400 20251207050000 20251124040000 61809 . UEKmb8LpYKMNcfc5BC8fk6/0hO3K9nKwF9yN5/o7cg9Fzb/xa3FQEZQSJHJu7CF5jAjhuUzVRMeGDjtPDkZPXJXA1qaxujGgoTca+hUirTGSrIjj1PurEU6PShREiAiZ1lVv2Cjje0zNzJOitlR3ZFYkMNdTNXcgZ5PCC+3SE8JkalrVbu4smaRH+yPH52Z0/bGK4eJCtObwaXNeNJYqOKPHAaJSnBtKjV+l9fF+l/O/19H0SiyFN+PSj/zzFoYNZYpQVW7OWSxQMYFhadKj4aIzxgW84p5b9hZXqHQMmXjpawvwQKS0ZfQfiP0pm2NGzS5UGp/ndUXwq2jPeWOrGg==' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: Removing record '.|NSEC|aaa. NS SOA RRSIG NSEC DNSKEY ZONEMD' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: Removing record 'loans|NSEC|locker. NS DS RRSIG NSEC' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: Removing record '.|RRSIG|NSEC 8 0 86400 20251207050000 20251124040000 61809 . YaikLmSny2k/Y8u1KaTT0bhm/O1AtbwN8wa2s3yOdj8vLOOKyFI8qkunX2CgbcqHhmhoP76uAvuFxnWJEKNGcVRsBFz5NuvXpuu9I0Z40iBd739RmH0fgBqutlIccRoQ0VlHOgbHNRdlk+Qds8zktNNlS28To01Jg1GeS7Bh+a4kegVYRXpSicFfjnRcSWr9ZGcrV6ryYB+tLLoT60Y/B3rE9Z9DDaW/z0syX8uIpsJk4M1XU0gmCvJpn0cilvKWzyy1PmJcnL+UR91ICSWEVQZnyjlgDyY0ZabtpZpQNGmO3FtQsWKyrvKFtoHody6L7LBXPOy2pjJzopIrCbSzDw==' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: Removing record 'loans|RRSIG|NSEC 8 1 86400 20251207050000 20251124040000 61809 . t5r8bUnoWAHhuC16TR0a84nEz07oMzIYb4ZS4rXMTBa73BUcSspylAD9n/f2l5HIeve/qJncvx8ipEsNMDTG2wWoqwqbeK9sbNHqleZg3OuU8xbi/ImhLFmjyOUrSinucQqu6wtQaTMTVIs+fLn7oInkpaBTNoWNS77QnuvA5cqBBdkLlwKnkClheIaHJKnMRxGyGYi85aoSNAROQafACPF9pkRYFyhXZqBCMs4mHnHHWpZJfgx9TO7QiQ3KI+6/h1fluJ0vkW1W3J4xCRZlYxKvDo3K9m/4Vz2AC3ALW5ocOtHV2wMnaxhULHWVVomHstIw4n/3WRA0ZtMIc+umdw==' in the AUTHORITY section received from local
Nov 24 14:31:41 [1] local: OPT answer '.' from 'local' nameservers
Nov 24 14:31:41 [1] local: Determining status after receiving this packet
Nov 24 14:31:41 [1] local: Status=NXDOMAIN, we are done
Nov 24 14:31:41 [1] local: No or invalid signature/proof for local, we likely missed a cut between . and local, looking for it
Because we remove all the wrong records from the packet (as they are from a parent (the root) and not the level we're looking at), we end up without a SOA and RRSIGs to validate. Hence the bogus response.
An NTA would be the 'correct' fix. Setting up a server to properly respond with a local. SOA would also work.
Would you mind sharing the configuration from your actual use-case? Having a small reproducer usually helps a lot, but in this case the resulting setup is a bit messed up because the entire zone we are forwarding for does not exist.
Although it looks synthetic, I do forward these 8 non-existing (reserved) zones on purpose (rationale above). The actual config forwards them to secondary "straight" recursor and . to dnsdist, but this part doesn't really matter.
On the client side - I see a lot of mDNS leaking into recursor (customers doing a lot of SRV _ldap.some.corp.local or A deleted.invalid) which end up in dnsdist unless deflected. At first I was about to create FR to handle them internally (just like rfc1918 and rfc6303, possibly with rfc6598), but then I thought I might simply recurse them to have neg-signed responses. This turned out to not work and doesn't seem to be documented.
The doc mentions NTA and DS glueing, but this case requires zone up-merging.
So for my actual case - I can work this around if there's no better option. The problem seems to be limited to NSEC of entire forwarded zone (this is when parent zone attests), so not much harm done from SERVAIL instead of NXDOMAIN with AD.
If you don't see another risks of such behavior and improving it isn't easy or worth the effort, we might close it as not planned.