trafficserver
trafficserver copied to clipboard
dns error logging to diags
Logs using SiteThrottled* version is helpful for noisy logs, being used here instead of print statements to help with the possible retries when a dns code occurs
[approve ci]
@fabikd : can you please run make -j8 format
and commit and push the changes? That should satisfy the clang-format CI job.
[approve ci autest]
Right, I think it's a great idea to have @SolidWallOfCode look at it. I used the code from the master branch, https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556
should I have used a different version? I’ve looked at the existing code in the master branch and these 3 labels did show goto Lerror
: default, NOTIMP, NOTZONE, and my intent was to leave goto Lerror
as is. I do not see goto Lerror
on this version of the code https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 for FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH and it was not my intention to remove goto Lerror
on any label.
As for the breaks, usually a switch statement has breaks after each case to avoid passing through the code under the next label. If I am using the correct version of the code, was the intention of it to not break and pass through all the named labels and use the last goto Lerror
as a catch all?
https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1581
If so, I can add goto Lerror
to FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH. Just let me know! Thank you!! (:
I used the code from the master branch, https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 should I have used a different version?
Your PR is correctly based on the master
branch. Nothing to change there.
I’ve looked at the existing code in the master branch and these 3 labels did show
goto Lerror
: default, NOTIMP, NOTZONE, and my intent was to leavegoto Lerror
as is. I do not seegoto Lerror
on this version of the code https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1556 for FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH and it was not my intention to removegoto Lerror
on any label.As for the breaks, usually a switch statement has breaks after each case to avoid passing through the code under the next label.
Developers must intentionally break
, return
, goto
, or not based on the flow of code they desire. Without statements such as these, the runtime path will "fall through" to the following case statements.
If I am using the correct version of the code, was the intention of it to not break and pass through all the named labels and use the last
goto Lerror
as a catch all? https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1581 If so, I can addgoto Lerror
to FORMERR, NXDOMAIN, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH. Just let me know! Thank you!! (:
Correct. Unless break
statements or the like are used, case
statements fall through to next ones. Thus, in this block from master branch (before your change):
case NOERROR:
case NXDOMAIN:
case 6: // YXDOMAIN
case 7: // YXRRSET
case 8: // NOTAUTH
case 9: // NOTAUTH
case 10: // NOTZONE
Debug("dns", "DNS error %d for [%s]", h->rcode, e->qname);
goto Lerror;
All of NOERROR
, NXDOMAIN
, etc., will fall through to that same Debug
log line and goto Lerror
. There's no need to repeat the Debug
and goto Lerror
statements for each case.
This can be seen with the following short example: https://godbolt.org/z/cqodTTKvq
Thank you again for working on this.
It is also very standard (from the early days of C coding, which I remember personally) to omit break
for cases that have completely identical code, as in this situation.
Also note that for C++17 there is now an standard keyword that should (must?) be used for fallthrough in switch
statements. E.g.
switch (value) {
case 0:
// code ...
[[fallthrough]];
case 1:
// code ...
See here. Note the style for case 1:
and case 2:
in the example.
That said, the errors are a bit odd, particularly this one
- 502 https://127.0.0.1:``/ 2 0
? ^^
+ 502 https://127.0.0.1:61712/ 2 0
? ^^^^^
My understanding was the double back tick indicated "anything goes" for that token. So what is the failure, exactly? The cert checking logic is a bit dependent on DNS due to how TLS vs. TCP connection errors are handled. It might be that the changes in the query reply code handling is causing this problem.
This may be related to the problem:
On master -
case NXDOMAIN:
case 6: // YXDOMAIN
case 7: // YXRRSET
case 8: // NOTAUTH
case 9: // NOTAUTH
case 10: // NOTZONE
Debug("dns", "DNS error %d for [%s]", h->rcode, e->qname);
goto Lerror;
In the PR
case NXDOMAIN:
SiteThrottledNote("NXDOMAIN: DNS error %d for [%s]: Domain name does not exist.", h->rcode, e->qname);
retry = false;
tcp_retry = false;
goto Lerror;
break;
Note the change in handling of retry
and tcp_retry
. This shouldn't happen for a PR that is only changing the logging.
I returned the switch to its original order instead of chronological order, removed all the breaks, and retry in order to allow for the original intention of allowing the fallthrough. In theory this will print the statement that it is selected as well as all the subsequent statements, that comes after it. example:
switch (1){ case 0: std::cout << 0; case 1: std::cout << 1; case 2: std::cout << 2; case 3: std::cout << 3;}
Output: 123
When researching the logs, it would be confusing to see 123
and have to know that the log that applies is the 1st one. Do we still want to print to log in this case?
Also, I believe I can add [[fallthrough]];
after the code in each label and it might build the failed tests on Rocky, Fedora, Debian, and AuTest. Because the Diags.h file is flagging all the fall through and failing them.
I'm just a passerby, so I'm sorry if I'm missing some background or previous discussion, but it seems like to me that all the log messages have the same format and you could avoid the repeat. Since DNS RCODE is basically an integer from 0 to 23, I'd simply replace the current Debug
lines with:
SiteThrottledNote("%s: DNS error %d for [%s]: %s", RCODE_NAME[h->rcode], h->rcode, e->qname, RCODE_DESCRIPTION[h->rcode]);
I guess using some macros, constexpr
, or Lexicon
from libswoc would make this a bit more efficient.
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
I'm just a passerby, so I'm sorry if I'm missing some background or previous discussion, but it seems like to me that all the log messages have the same format and you could avoid the repeat. Since DNS RCODE is basically an integer from 0 to 23, I'd simply replace the current
Debug
lines with:SiteThrottledNote("%s: DNS error %d for [%s]: %s", RCODE_NAME[h->rcode], h->rcode, e->qname, RCODE_DESCRIPTION[h->rcode]);
I guess using some macros,
constexpr
, orLexicon
from libswoc would make this a bit more efficient.https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
Thank you for your feedback!! I will give it a try! (:
[approve ci autest]
Seems fine. A couple of notes, although not worth delaying the merge.
For the names, we might consider using Lexicon, although the utility is less because it's a one way conversion. I might be tempted to have an array of string pairs, to group the tag and description, but then the use site is more complex. This is pretty deep in the style weeds so take them as potential ideas for future work.