coredns icon indicating copy to clipboard operation
coredns copied to clipboard

Enhancement (cache): Add log or metric for missing SOA on negative response

Open gcs278 opened this issue 9 months ago • 1 comments

What would you like to be added:

It's known that the cache plugin doesn't store negative (NXDOMAIN) responses as it complies with https://tools.ietf.org/html/rfc2308#section-5:

   Negative responses without SOA records SHOULD NOT be cached as there
   is no way to prevent the negative responses looping forever between a
   pair of servers even with a short TTL.

However, like in https://github.com/coredns/coredns/pull/3755, we have users that have upstreams servers that are not sending compliant NXDOMAIN responses with an SOA (https://datatracker.ietf.org/doc/html/rfc2308#section-3). The DNS load on the upstream servers is significantly increased due to CoreDNS not caching these requests.

Unlike the solution presented in https://github.com/coredns/coredns/pull/3755 which enables caching of NXDOMAIN responses with no SOA, I'm curious if the community would be open to adding a log message and/or metric that would create better visibility for this problematic and non-compliant situation.

As for log message vs metric: a log message at a minimum would be nice, but a metric of some sort (maybe coredns_forward_negative_response_missing_soa_total) would be even better, as it would allow our platform to create alerts on missing SOAs.

Why is this needed:

The motivation for a log message or metric is to encourage users to:

  1. Provide better visibility into this non-compliant situation which can result in overloading upstream DNS servers
  2. Encourage users to pursue fixing a non-compliant upstream server NXDOMAIN response.

I'm happy to create a PR with the log and/or metric provided there is some agreement whether a log and/or a metric is an appropriate solution. I am curious if there is any precedent for logging non-compliant scenarios like this.

gcs278 avatar May 16 '24 19:05 gcs278