linkerd-zipkin
linkerd-zipkin copied to clipboard
zipkin-http error: name resolution is negative
Curious if anyone here had insight into this error. I am seeing it roughly every 5 seconds on instances where it is showing up.
THREAD130: zipkin-http: name resolution is negative (local dtab: Dtab())
source: stderr
tag: linkerd/91a2f8a72676
We recently added linkerd-zipkin module to our env where we are running linkerd, namerd, consul.
Hi @carlislk. This means that Linkerd is unable to resolve an address. Possibly the address of the zipkin collector. How is your zipkin-http telemeter configured?
Hi @adleong. Thanks for your response.
telemetry:
- kind: io.zipkin.http host: collector.Domain.com:9411 initialSampleRate: 0.40
- kind: io.l5d.influxdb
Where collector.Domain.com points at an AWS NLB and targets are jaeger-collector containers running in ECS.
I am able to resolve the DNS of this address from the instances where this error is occurring. Possibly could this be an issue with caching?
it's certainly possible, Linkerd uses the JVM dns resolver. perhaps tools like dig
could be useful for figuring out the details of the dns setup and if the JVM is picking up the correct system dns settings.
From what I can tell linkerd caches dns and upon failure will not reresolve dns. So for example if we update the record set linkerd will use the old entry and not attempt to lookup the new entry upon failure. Is this expected behavior?
Also I am trying to understand why the following is in the output:
(local dtab: Dtab())
Where can I find what address linkerd is storing for the collector address mentioned above?
(local dtab: Dtab())
simply indicates that Linkerd isn't using a routing override. You can safely ignore this.
Unfortunately, I don't think the result of the DNS resolution is exposed anywhere.
@carlislk DNS caching on the JVM has always been a bit... unfortunate. Can you tell us a bit more about your use case?
There's an AWS-specific article about this very problem, though you may have to do a little work to apply it to a containerized Linkerd: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
@wmorgan Thanks for your response. Seems possible that the jvm ttl mentioned in the article could be causing this issue. Do we know what the ttl is set by default for this module / linkerd? Also any ideas where I could find / modify this setting?
According to Oracle's Networking Properties doc, the default is -1
which means cache forever. I would imagine you could set the property on the JVM like -Dnetworkaddress.cache.ttl
or something like that. Like @wmorgan mentioned, you may have to do a little work to get that working in a containerized Linkerd.