zipkin-ui icon indicating copy to clipboard operation
zipkin-ui copied to clipboard

Automatically calculate and display network latency for a span in the UI

Open codefromthecrypt opened this issue 8 years ago • 4 comments

From @mansu on October 18, 2016 20:57

Currently, when we look at a span belonging to a trace in the UI, one has to mentally calculate the network latency to identify what percentage of the latency came from processing and what percentage of latency came from request processing.

So, if a span has cs, cr, ss and sr, the network latency would be (sr - cs) + (ss - cr). Bonus points if the network latency can be shown like the chrome network latency tab. Adding an image from chrome documentation for reference.

image

Copied from original issue: openzipkin/zipkin#1345

codefromthecrypt avatar Oct 24 '16 01:10 codefromthecrypt

fwiw latency is a guess because the distance between "cs" and "sr" is hardly a signal compared to the resource timing api used in chrome https://www.w3.org/TR/resource-timing/#resources-included

that said, using annotations that exist in zipkin, we could color or otherwise the inner most "ws", "wr", failing back to "cs" "sr" when these aren't present. Using "ws" closer to the goal as there's often scheduling or otherwise involved. Plus, you can have an "cs" "sr" served from cache (no network response!), where it would be an error to signal "ws" when there's nothing on the w(ire).

https://github.com/openzipkin/zipkin-api/blob/master/thrift/zipkinCore.thrift

codefromthecrypt avatar Oct 24 '16 01:10 codefromthecrypt

I did something like this locally, is this kind of what you mean?

The button top-right collapses the latency rows so you don't see them.

zipkin-modal

conorgriffin avatar Dec 02 '16 14:12 conorgriffin

looks nice! actually it calls out that we probably don't want the primary duration label as "response time" since it isn't necessarily that. While not everyone logs it, it might end up needed to special case the "ws" "wr" annotations when present, as that's closer to network time. For example, in a lot of instrumentation "cs" -> "sr" includes a lot of in-process overhead, too. sometimes I've seen this at millisecond scale!

the client->server label might be too vague. but then again, I can see how using labels of the service or ip could be long. maybe good how it is!

Then.. there's some people using "ms" "mr" for one-way messages. It is the same as the shared span above, except for kafka. While not standardized, there are a few doing this, and it might help some showcase their work.

Anyway good start! I like it.

codefromthecrypt avatar Dec 02 '16 21:12 codefromthecrypt

@conorgriffin Yes, that's exactly what I was suggesting. This is a great start!

What do you think about putting this info as breakdown in "Response time" at the top instead of inline in the span. I think that it would take more advantage of the white space and not take up vertical space, which can be a premium when looking at spans with a lot of tags.

mansu avatar Dec 02 '16 22:12 mansu