redpanda Excessive DNS queries by metrics reporter when on failed response

Excessive DNS queries by metrics reporter when on failed response

Open larsenpanda opened this issue 3 years ago • 1 comments

trafficstars

Version & Environment

Redpanda: v22.1.4 (rev 491e569) Centos: 3.10.0-1160.53.1.el7.x86_64

What went wrong?

Someone installed Redpanda in a secure sandbox environment (does not have dns access to query the internet) and our metrics reporter being on by default ended up sending "800 dns queries per second" to their internal dns server, which triggered an alarm and prompted them to cut off the three Redpanda nodes from being able to issue queries against it.

I'd like to know why it would be retrying so aggressively if it's not getting an ip it can use.

What should have happened instead?

Perhaps we need a backoff or similar if we don't get an IP resolved. I suspect a SERVFAIL message is responded with but we may not want to get specific on that condition.

How to reproduce the issue?

I am not able to get the reproduction steps because it pertains to security architecture which the linux admin is not able to share. We should be able to reproduce by using a BIND dns server and disallowing internet based IP ranges as the response.

Additional information

Logs were not possible to attain.

Aug 01 '22 17:08 larsenpanda

Notwithstanding the overly aggressive retries, for systems not connected to the internet one should set enable_metrics_reporter to false.

Aug 03 '22 08:08 jcsp

redpanda redpanda copied to clipboard

Excessive DNS queries by metrics reporter when on failed response

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

redpanda
redpanda copied to clipboard