pdns
pdns copied to clipboard
Inconsistent EDE data in dnstap
- Program: dnsdist
- Issue type: Bug report
Short description
Inconsistent messages in dnstap for EDE versus what is provided in query response
Environment
- Operating system: redacted
- Software version: redacted
- Software source: pdns-rec 4.9.5, dnsdist (recent - less than 30 days, IIRC)
I'm looking at DNSSEC errors (coincidentally, in Amsterdam) for a day or so, and trying to figure out our classes of errors that are handed back in EDE which create a SERVFAIL towards the end user. I've trimmed down the error set - I excluded "No reachable authority" errors (which are rampant)
Here is the set from 24 hours excluding "no reachable authority", from a small sub-section of our AMS cluster.
┌─event.responseData.opt.ede.purpose─┬─errortype─┐
│ ['Network Error'] │ 11 │
│ ['Unsupported DNSKEY Algorithm'] │ 1520 │
│ ['Signature Expired'] │ 37544 │
│ ['DNSKEY Missing'] │ 49717 │
│ ['RRSIGs Missing'] │ 56164 │
│ ['NSEC Missing'] │ 60216 │
│ ['Synthesized'] │ 75301 │
│ ['DNSSEC Bogus'] │ 78056 │
│ ['Other Error'] │ 277917 │
└────────────────────────────────────┴───────────┘
So what are all those "other error" items? This seems to be an unusually large number in the "catchall" category.
I dug into this a bit, and I need some sanity checking, or perhaps this is a bug.
I found a domain that is coming up with "other error" as reported in the dnstap data set - tracker.publicbt.com. There are ~6000 of those in one of my logfiles, so I figured it would be a good test.
When I look at dnsviz, this is a "refused" error, and sure enough when I do a "dig" I get a no reachable authority result:
jtodd@dev01:~$ dig @9.9.9.9 tracker.publicbt.com
; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> @9.9.9.9 tracker.publicbt.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 14781
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 22 (No Reachable Authority): (delegation publicbt.com)
;; QUESTION SECTION:
;tracker.publicbt.com. IN A
;; Query time: 0 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Sun May 26 16:48:38 UTC 2024
;; MSG SIZE rcvd: 78
jtodd@dev01:~$
But when I look through the dnstap logs, I find that they are not being listed as "no reachable authority" but in fact are showing up as "other error" (info code 0). I find no events in the dnsstap output that shows "no reachable authority" for that name, even though the name appears hundreds of times. All of the errors are "other error" which seems to not match what I see in my actual query results.
I am collecting the data from dnstap, which is sent by dnsdist. pdns-rec is of course behind dnsdist, along with (as usual) unbound, which we currently do not have sending ede results (therefore, unbound answers never appear with any EDE data set, so they are not considered in my searches.)
Is this a dnsdist error with dnstap? Or is this a method problem?