trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

dns error logging to diags

Open fabikd opened this issue 2 years ago • 2 comments

Logs using SiteThrottled* version is helpful for noisy logs, being used here instead of print statements to help with the possible retries when a dns code occurs

fabikd avatar Aug 09 '22 21:08 fabikd

[approve ci]

ezelkow1 avatar Aug 09 '22 21:08 ezelkow1

@fabikd : can you please run make -j8 format and commit and push the changes? That should satisfy the clang-format CI job.

bneradt avatar Aug 10 '22 16:08 bneradt

[approve ci autest]

bneradt avatar Aug 16 '22 22:08 bneradt

Right, I think it's a great idea to have @SolidWallOfCode look at it. I used the code from the master branch, https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 should I have used a different version? I’ve looked at the existing code in the master branch and these 3 labels did show goto Lerror: default, NOTIMP, NOTZONE, and my intent was to leave goto Lerror as is. I do not see goto Lerror on this version of the code https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 for FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH and it was not my intention to remove goto Lerror on any label.

As for the breaks, usually a switch statement has breaks after each case to avoid passing through the code under the next label. If I am using the correct version of the code, was the intention of it to not break and pass through all the named labels and use the last goto Lerror as a catch all? https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1581 If so, I can add goto Lerror to FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH. Just let me know! Thank you!! (:

fabikd avatar Aug 19 '22 22:08 fabikd

I used the code from the master branch, https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 should I have used a different version?

Your PR is correctly based on the master branch. Nothing to change there.

I’ve looked at the existing code in the master branch and these 3 labels did show goto Lerror: default, NOTIMP, NOTZONE, and my intent was to leave goto Lerror as is. I do not see goto Lerror on this version of the code https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 for FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH and it was not my intention to remove goto Lerror on any label.

As for the breaks, usually a switch statement has breaks after each case to avoid passing through the code under the next label.

Developers must intentionally break, return, goto, or not based on the flow of code they desire. Without statements such as these, the runtime path will "fall through" to the following case statements.

If I am using the correct version of the code, was the intention of it to not break and pass through all the named labels and use the last goto Lerror as a catch all? https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1581 If so, I can add goto Lerror to FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH. Just let me know! Thank you!! (:

Correct. Unless break statements or the like are used, case statements fall through to next ones. Thus, in this block from master branch (before your change):

    case NOERROR:
    case NXDOMAIN:
    case 6:  // YXDOMAIN
    case 7:  // YXRRSET
    case 8:  // NOTAUTH
    case 9:  // NOTAUTH
    case 10: // NOTZONE
      Debug("dns", "DNS error %d for [%s]", h->rcode, e->qname);
      goto Lerror;

All of NOERROR, NXDOMAIN, etc., will fall through to that same Debug log line and goto Lerror. There's no need to repeat the Debug and goto Lerror statements for each case.

This can be seen with the following short example: https://godbolt.org/z/cqodTTKvq

Thank you again for working on this.

bneradt avatar Aug 20 '22 01:08 bneradt

It is also very standard (from the early days of C coding, which I remember personally) to omit break for cases that have completely identical code, as in this situation.

Also note that for C++17 there is now an standard keyword that should (must?) be used for fallthrough in switch statements. E.g.

switch (value) {
  case 0:
    // code ...
    [[fallthrough]];
  case 1:
    // code ...

See here. Note the style for case 1: and case 2: in the example.

SolidWallOfCode avatar Aug 21 '22 21:08 SolidWallOfCode

That said, the errors are a bit odd, particularly this one

         - 502 https://127.0.0.1:``/ 2 0
         ?                       ^^
         
         + 502 https://127.0.0.1:61712/ 2 0
         ?                       ^^^^^

My understanding was the double back tick indicated "anything goes" for that token. So what is the failure, exactly? The cert checking logic is a bit dependent on DNS due to how TLS vs. TCP connection errors are handled. It might be that the changes in the query reply code handling is causing this problem.

SolidWallOfCode avatar Aug 21 '22 21:08 SolidWallOfCode

This may be related to the problem:

On master -

    case NXDOMAIN:
    case 6:  // YXDOMAIN
    case 7:  // YXRRSET
    case 8:  // NOTAUTH
    case 9:  // NOTAUTH
    case 10: // NOTZONE
      Debug("dns", "DNS error %d for [%s]", h->rcode, e->qname);
      goto Lerror;

In the PR

    case NXDOMAIN:
      SiteThrottledNote("NXDOMAIN: DNS error %d for [%s]: Domain name does not exist.", h->rcode, e->qname);
      retry     = false;
      tcp_retry = false;
      goto Lerror;
      break;

Note the change in handling of retry and tcp_retry. This shouldn't happen for a PR that is only changing the logging.

SolidWallOfCode avatar Aug 22 '22 14:08 SolidWallOfCode

I returned the switch to its original order instead of chronological order, removed all the breaks, and retry in order to allow for the original intention of allowing the fallthrough. In theory this will print the statement that it is selected as well as all the subsequent statements, that comes after it. example:   switch (1){ case 0: std::cout << 0; case 1: std::cout << 1; case 2: std::cout << 2; case 3: std::cout << 3;} Output: 123   When researching the logs, it would be confusing to see 123 and have to know that the log that applies is the 1st one. Do we still want to print to log in this case?

Also, I believe I can add [[fallthrough]]; after the code in each label and it might build the failed tests on Rocky, Fedora, Debian, and AuTest. Because the Diags.h file is flagging all the fall through and failing them.

fabikd avatar Aug 23 '22 01:08 fabikd

I'm just a passerby, so I'm sorry if I'm missing some background or previous discussion, but it seems like to me that all the log messages have the same format and you could avoid the repeat. Since DNS RCODE is basically an integer from 0 to 23, I'd simply replace the current Debug lines with:

SiteThrottledNote("%s: DNS error %d for [%s]: %s", RCODE_NAME[h->rcode], h->rcode, e->qname, RCODE_DESCRIPTION[h->rcode]);

I guess using some macros, constexpr, or Lexicon from libswoc would make this a bit more efficient.

https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6

maskit avatar Aug 23 '22 02:08 maskit

I'm just a passerby, so I'm sorry if I'm missing some background or previous discussion, but it seems like to me that all the log messages have the same format and you could avoid the repeat. Since DNS RCODE is basically an integer from 0 to 23, I'd simply replace the current Debug lines with:

SiteThrottledNote("%s: DNS error %d for [%s]: %s", RCODE_NAME[h->rcode], h->rcode, e->qname, RCODE_DESCRIPTION[h->rcode]);

I guess using some macros, constexpr, or Lexicon from libswoc would make this a bit more efficient.

https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6

Thank you for your feedback!! I will give it a try! (:

fabikd avatar Aug 25 '22 17:08 fabikd

[approve ci autest]

bryancall avatar Aug 29 '22 23:08 bryancall

Seems fine. A couple of notes, although not worth delaying the merge.

For the names, we might consider using Lexicon, although the utility is less because it's a one way conversion. I might be tempted to have an array of string pairs, to group the tag and description, but then the use site is more complex. This is pretty deep in the style weeds so take them as potential ideas for future work.

SolidWallOfCode avatar Sep 02 '22 21:09 SolidWallOfCode