trafficcontrol
trafficcontrol copied to clipboard
Tr fix nxdomain
Related: #7082
This PR uses a somewhat hacky solution for issue #7082, to check the NS record for a given FQDN to determine if it is fully an NXDOMAIN.
Which Traffic Control components are affected by this PR?
- Traffic Router
What is the best way to verify this PR?
Apply the PR, do a DNS lookup for the Delivery Service FQDN for A records, NS, records, AAAA records. A and NS return valid answers, AAAA returns an empty NOERROR.
If this is a bugfix, which Traffic Control versions contained the bug?
All stable
PR submission checklist
- [x] This PR has a CHANGELOG.md entry
- [x] This PR DOES NOT FIX A SERIOUS SECURITY VULNERABILITY (see the Apache Software Foundation's security guidelines for details)
I'm trying to test this now to see if this solves my NXDOMAIN issue. Could you add a specific test so I can understand what this tries to fix?
For example, here's what I try:
dig @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net A +short
68.1.14.136
68.1.14.145
then, on the AAAA record:
$ dig @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net AAAA
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.10 <<>> @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net AAAA
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52941
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
Hello,
I am far from a PC at this moment. However, if you use the TR without the patch, you will see the reply is an NXDOMAIN, which is not okay as the domain does indeed exists. Applying the patch would instead return NOERROR, without any AAAA record (empty) [NODATA].
The current logic of returning NXDOMAIN, breaks certain resolvers, basically dropping everything from their cache, even A records.
Let's say an IPv4 user comes and request an A record, TR replies normally and the DNS resolver (that the user has configured on his device) caches that response. Now, a second user comes and requests an AAAA record, now TR replies NXDOMAIN, with this, the resolver drops its previous A cache and now stores the NXDOMAIN for the duration of the TTL. This causes that if a third or more users, requesting an A record again, being returned NXDOMIAIN, even if it's an A record we know exits, until TTL expires.
Here is the RFC defining the NXDOMAIN: https://www.rfc-editor.org/rfc/rfc8020
And here the RFC TR fails to fullfil without the patch: https://www.rfc-editor.org/rfc/rfc2308 (see section 2.2 about No Data)
/"NODATA" - a pseudo RCODE which indicates that the name is valid, for the given class, but are no records of the given type./
Sincerely!
Mike
Dec 29, 2022 13:28:26 Steve Malenfant @.***>:
I'm trying to test this now to see if this solves my NXDOMAIN issue. Could you add a specific test so I can understand what this tries to fix?
For example, here's what I try:
*dig @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net A +short 68.1.14.136 68.1.14.145 * then, on the AAAA record:
*$ dig @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net AAAA
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.10 <<>> @cdn1cdcrs0001.coxlab.net test.ece.cdn1.coxlab.net AAAA ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 52941 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; WARNING: recursion requested but not available * — Reply to this email directly, view it on GitHub[https://github.com/apache/trafficcontrol/pull/7083#issuecomment-1367509728], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AB7J4R7F2MFC4RC42WPGIJLWPXJ4TANCNFSM6AAAAAAQSUXTI4]. You are receiving this because you authored the thread.[Tracking image][https://github.com/notifications/beacon/AB7J4R4J6QKSX7IRMMPKLNLWPXJ4TA5CNFSM6AAAAAAQSUXTI6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSRQKFOA.gif]
@mikeV02 Under which circumstance does this occur? So far, if I create a static DNS record on a delivery service (A), NOERROR is returned with AAAA.
I've got a case where I configure a "*" record for a wildcard, the A works and the AAAA returns NXDOMAIN. I've applied your patch and it still returns NXDOMAIN. I was just wondering how to test your patch.
I have incorporated this patch on a 6.1.x traffic-router. It is working fine for me. query for AAAA record for which a A record exists, but AAAA does not:
[root@traffic-router-577b8498d5-dgpxh /]# dig @localhost cdn.4930b852-0b4c-40fc-9238-0d9c2b99fbb0.mycdn.lilac.in AAAA
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.10 <<>> @localhost cdn.4930b852-0b4c-40fc-9238-0d9c2b99fbb0.mycdn.lilac.in AAAA
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25661
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;cdn.4930b852-0b4c-40fc-9238-0d9c2b99fbb0.mycdn.lilac.in. IN AAAA
;; AUTHORITY SECTION:
4930b852-0b4c-40fc-9238-0d9c2b99fbb0.mycdn.lilac.in. 30 IN SOA traffic-router-577b8498d5-dgpxh.mycdn.lilac.in. traffic_ops.4930b852-0b4c-40fc-9238-0d9c2b99fbb0.mycdn.lilac.in. 2023012005 28800 7200 604800 30
;; Query time: 6 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 20 05:55:00 UTC 2023
;; MSG SIZE rcvd: 153
Thank you @mikeV02