pdns icon indicating copy to clipboard operation
pdns copied to clipboard

OPCODE in dnsdist

Open StenStensonSten opened this issue 1 year ago • 6 comments

  • Program: dnsdist
  • Issue type: Feature request

Short description

Make OPCODE more visible in dnsdist. Show opcode in for example grepq("2000ms")

Usecase

Usecase is partly linked with issue https://github.com/PowerDNS/pdns/issues/10624

We are running dnsdist as resolver (backend is Pdns rec) for our customers. When you start using dnsdist you issue showServers() quite often to see if everything looks ok.

The first thing you notice is that Drops are increasing and wonder whats happening.

showServers() Name Address State Qps Qlim Ord Wt Queries Drops Drate Lat TCP Outstanding Pools 0 rec01 xxx.160.127.242:53 up 1766.9 0 1 1 804515002 3269144 2.0 18.6 1253.1 75 1 rec02 xxx.160.127.243:53 up 1000.9 0 1 1 805428330 3276349 2.0 51.4 2654.3 79 2 rec03 xxx.208.42.18:53 up 810.9 0 1 1 588556509 2756507 2.0 70.0 1085.7 75 3 rec04 xxx.208.42.19:53 up 1178.9 0 1 1 584137674 2746558 2.0 33.4 1636.4 76 All 4754.0 2782637515 12048558

Then you issue grepq("2000ms") and see this; (List is shortend, it about 50-100 entries with google.com)

-1.1 xxx.122.132.90:42761 DoUDP xxx.160.127.242:53 9574 google.com. A T.O RD No Error. 0 answers -1.1 xxx.8.60.168:56943 DoUDP xxx.160.127.242:53 5063 google.com. A T.O RD No Error. 0 answers -1.1 xxx.64.111.10:52370 DoUDP xxx.208.42.18:53 17746 google.com. A T.O RD No Error. 0 answers -1.1 xxx.64.111.10:38445 DoUDP xxx.208.42.18:53 37625 google.com. A T.O RD No Error. 0 answers -1.1 xxx.8.60.168:56943 DoUDP xxx.208.42.18:53 5063 google.com. A T.O RD No Error. 0 answers

When you see this the first time its quite easy to come to the conclusion that something is broken. Its not easy to understand/know that queries are dropped by the pdns-recursor because OPCODE=2.

Description

Dont know the best solution but maybe;

  1. Have separate counters for OPCODE drops.
  2. Update doc with example how to DROP OPCODE=2 queries in dnsdist.
  3. Update doc with example to rewrite OPCODE=2 to OPCODE=0 in dnsdist.
  4. If the solution is not in dnsdist maybe revisit https://github.com/PowerDNS/pdns/issues/10624

StenStensonSten avatar Feb 05 '24 08:02 StenStensonSten

Hi!

I understand the issue, but I'm unsure what the solution is.

1. Have separate counters for OPCODE drops.

This seems too specific to me, if we go this way I'm afraid we will end up with so many counters that it's impossible to know what's going on. It might even impact the performance when collecting metrics.

2. Update doc with example how to DROP OPCODE=2 queries in dnsdist.

Sure, I would merge a pull request adding a exemple to https://dnsdist.org/rules-actions.html#OpcodeRule

3. Update doc with example to rewrite OPCODE=2 to OPCODE=0 in dnsdist.

I would really advise against doing something like that, dnsdist tries very hard to not rewrite queries or responses.

4. If the solution is not in dnsdist maybe revisit [Make recursor reply to queries with OPCODE=2 #10624](https://github.com/PowerDNS/pdns/issues/10624)

Well, it would certainly make the metrics in dnsdist look better, but on the other hand it would mean spending resources to generate a send a response in the recursor, and resources to forward the response in dnsdist, all for a response that is not going to be useful to the client. In theory the solution would be finding out who is sending these non-sense queries and kindly ask them to stop doing that, but that's probably not going to happen. So perhaps we could add an option to the recursor, but this needs to be discussed in #10624.

rgacogne avatar Feb 05 '24 08:02 rgacogne

  1. add an opcode column to grepq output?

Habbie avatar Feb 05 '24 09:02 Habbie

Technically we can, but the amount of information there is already huge enough that it often doesn't fit in a terminal line, and my completely unscientific feeling is that it's going to be useful 0.01% of the time.

rgacogne avatar Feb 05 '24 10:02 rgacogne

We drop these in (early) rules, something like:

addAction(NotRule(OpcodeRule(DNSOpcode.QUERY)), DropAction())

phonedph1 avatar Feb 06 '24 03:02 phonedph1

We drop these in (early) rules, something like:

addAction(NotRule(OpcodeRule(DNSOpcode.QUERY)), DropAction())

Hmm yeah this looks like a good solution, but when i try to execute in dnsdist I get the following error;

addAction(NotRule(OpcodeRule(DNSOpcode.QUERY)), DropAction()) Error: [string "return addAction(NotRule(OpcodeRule(DNSOpcode..."]:1: Unable to convert parameter from nil to m stack traceback: [C]: in function 'OpcodeRule' [string "return addAction(NotRule(OpcodeRule(DNSOpcode..."]:1: in main chunk>

StenStensonSten avatar Feb 13 '24 12:02 StenStensonSten

It's DNSOpcode.Query see https://dnsdist.org/reference/constants.html#dnsopcode

So

addAction(NotRule(OpcodeRule(DNSOpcode.Query)), DropAction())

rgacogne avatar Feb 13 '24 12:02 rgacogne