coredns
coredns copied to clipboard
Enhancement (cache): Add log or metric for missing SOA on negative response
What would you like to be added:
It's known that the cache plugin doesn't store negative (NXDOMAIN) responses as it complies with https://tools.ietf.org/html/rfc2308#section-5:
Negative responses without SOA records SHOULD NOT be cached as there
is no way to prevent the negative responses looping forever between a
pair of servers even with a short TTL.
However, like in https://github.com/coredns/coredns/pull/3755, we have users that have upstreams servers that are not sending compliant NXDOMAIN responses with an SOA (https://datatracker.ietf.org/doc/html/rfc2308#section-3). The DNS load on the upstream servers is significantly increased due to CoreDNS not caching these requests.
Unlike the solution presented in https://github.com/coredns/coredns/pull/3755 which enables caching of NXDOMAIN responses with no SOA, I'm curious if the community would be open to adding a log message and/or metric that would create better visibility for this problematic and non-compliant situation.
As for log message vs metric: a log message at a minimum would be nice, but a metric of some sort (maybe coredns_forward_negative_response_missing_soa_total
) would be even better, as it would allow our platform to create alerts on missing SOAs.
Why is this needed:
The motivation for a log message or metric is to encourage users to:
- Provide better visibility into this non-compliant situation which can result in overloading upstream DNS servers
- Encourage users to pursue fixing a non-compliant upstream server NXDOMAIN response.
I'm happy to create a PR with the log and/or metric provided there is some agreement whether a log and/or a metric is an appropriate solution. I am curious if there is any precedent for logging non-compliant scenarios like this.