linkerd-zipkin icon indicating copy to clipboard operation
linkerd-zipkin copied to clipboard

zipkin-http error: name resolution is negative

Open carlislk opened this issue 6 years ago • 8 comments

Curious if anyone here had insight into this error. I am seeing it roughly every 5 seconds on instances where it is showing up.

THREAD130: zipkin-http: name resolution is negative (local dtab: Dtab()) source: stderr tag: linkerd/91a2f8a72676

We recently added linkerd-zipkin module to our env where we are running linkerd, namerd, consul.

carlislk avatar Jun 08 '18 18:06 carlislk

Hi @carlislk. This means that Linkerd is unable to resolve an address. Possibly the address of the zipkin collector. How is your zipkin-http telemeter configured?

adleong avatar Jun 08 '18 19:06 adleong

Hi @adleong. Thanks for your response.

telemetry:

  • kind: io.zipkin.http host: collector.Domain.com:9411 initialSampleRate: 0.40
  • kind: io.l5d.influxdb

Where collector.Domain.com points at an AWS NLB and targets are jaeger-collector containers running in ECS.

I am able to resolve the DNS of this address from the instances where this error is occurring. Possibly could this be an issue with caching?

carlislk avatar Jun 08 '18 20:06 carlislk

it's certainly possible, Linkerd uses the JVM dns resolver. perhaps tools like dig could be useful for figuring out the details of the dns setup and if the JVM is picking up the correct system dns settings.

adleong avatar Jun 08 '18 21:06 adleong

From what I can tell linkerd caches dns and upon failure will not reresolve dns. So for example if we update the record set linkerd will use the old entry and not attempt to lookup the new entry upon failure. Is this expected behavior?

Also I am trying to understand why the following is in the output: (local dtab: Dtab())

Where can I find what address linkerd is storing for the collector address mentioned above?

carlislk avatar Jun 08 '18 22:06 carlislk

(local dtab: Dtab()) simply indicates that Linkerd isn't using a routing override. You can safely ignore this.

Unfortunately, I don't think the result of the DNS resolution is exposed anywhere.

adleong avatar Jun 12 '18 18:06 adleong

@carlislk DNS caching on the JVM has always been a bit... unfortunate. Can you tell us a bit more about your use case?

There's an AWS-specific article about this very problem, though you may have to do a little work to apply it to a containerized Linkerd: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html

wmorgan avatar Jun 13 '18 12:06 wmorgan

@wmorgan Thanks for your response. Seems possible that the jvm ttl mentioned in the article could be causing this issue. Do we know what the ttl is set by default for this module / linkerd? Also any ideas where I could find / modify this setting?

carlislk avatar Jun 20 '18 18:06 carlislk

According to Oracle's Networking Properties doc, the default is -1 which means cache forever. I would imagine you could set the property on the JVM like -Dnetworkaddress.cache.ttl or something like that. Like @wmorgan mentioned, you may have to do a little work to get that working in a containerized Linkerd.

dadjeibaah avatar Jun 20 '18 19:06 dadjeibaah