zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

Making sense of large traces

Open jorgheymans opened this issue 3 years ago • 0 comments

Description It's a trend that sites are enthusiastically instrumenting everything, in order to increase observability on what is happening - or so they hope. This leads to large traces being generated, imagine a noisy JPA batch update on a couple of hundred records (span per update), or a instrumented message being published from a producer to a horde of instrumented consumers (span per poll + subsequent spans for the record processing, and when the consumer also broadcasts events then it goes ad infinitum)

Feature When presented with a large trace (assume this is a couple of hundred spans or more), zipkin UI should allow the person looking at the trace to make sense of it. This 'making sense' is the topic of this issue, in what ways would people like to make sense of such large trace to find what they are looking for. Some ideas that have been thought of

  • query inside a trace (@basvanbeek can you give some specific examples here what you were thinking of ?)
  • generate a summary view (e.g. top 5 latency contributors in a trace)
  • filter a trace by span type (e.g. toggle off the jdbc spans)
  • ...

Rational Large traces are a thing, and as much as we would like to tell people that they are not always useful they will end up being generated and we should have a way to cope with them.

Example Scenario At our site (EC), one of the more larger service ecosystems is constantly being confronted with very large traces, they are not easy to navigate or troubleshoot and the team has started to look for ways to mediate this problem:

image

Not included in this issue

  • performance improvements on client side rendering, as reported various times already #2496 #2230 #2411 #1460 . Though the concepts presented here could have a positive impact (for example by not showing the complete trace from the start), it's not the goal of this issue.

Prior Art @tacigar has been doing quite a few experiments on different visualizations

  • https://github.com/openzipkin/zipkin/pull/3213

Related #2638 : filtering services #2920 : collapse spans #2496 : allow to limit number of spans #2296 : optionally mask local / intermediate spans #2622 : service topology view for a single trace

jorgheymans avatar Oct 08 '20 13:10 jorgheymans