specification How to record casual relationships / sequencing between sibling spans

EDIT: decided below to re-focus this issue on the sequencing of sibling spans. Original title was "Provide example of reporting "middleware" hops"

Suppose we have an RPC call from Service A to Service B. In classic Zipkin service A starts a "client" span, and service B joins that span as "server". This results in a single span in the storage demarcated by cs->sr->ss->cr anotations. The new opentracing API advocates using different spans for client and server, but that's besides the point.

The question is what happens if there is some middleware between A and B that can also enrich the trace (for example, haproxy, or Hyperbahn). There may also be more than one hop through the middleware until the request reaches service B. There are two ways to represent this in the span-based tracing model

Nested spans

 client span (Service A)
+----------------------------------------------------------------------------+

    +---------------------------------------------------------------------+
     hop 1      
               +----------------------------------------------------------+
                hop 2
                          +-------------------+
                           server  (Service B)

Issue 1: when building service dependency graph, this trace will produce a dependency A->MW->MW->B. If there are many dependencies like this, the diagram will look like everything depends on MW, and MW talks to everything, but the A->B dependency is lost.

Possible solution: mark the "hop" spans with a special attribute indicating middleware, and handle them specially when building dependency diagram.

Issue 2: if the middleware is implemented as a proxy, it makes sense that a "hop" span does not complete until the server span is complete. However, if the middleware is implemented as a messaging system, the above trace does not make sense, it should look like below.

Stacked sibling spans

 client span (Service A)
+----------------------------------------------------------------------------+
    +------+   +------+   +-------------------+ +----------+ +------------+
     hop 1      hop 2      server                back-hop-2   back-hop-1

Issue: in order to display the trace as shown above, especially in light of clock skews, the UI needs to know that there is a strong happened-before relationships between spans. The current DCP API does not capture that relationship, and it's not clear if it can be captured via span annotations since each stacked span knows nothing about its siblings. In contrast, X-Trace API explicitly captured these relationships by means of using pushDown and pushNext operations.

Dec 02 '15 20:12 yurishkuro

@bensigelman @adriancole

Dec 02 '15 23:12 yurishkuro

Idea A

How about a HappensAfterSpan and/or HappensBeforeSpan tag? The values would be span_ids or similar. To be sufficiently general, that would imply that span tags are 1:many rather than 1:1 OR that we would use span log payloads to express these relationships.

If we reference the X-Trace paper:

or, not as an image:

next.parentID ⇐ current.opID
next.opID ⇐ unique()
next.type ⇐ NEXT

Wherever the pushNext would have happened in the X-Trace world, we instead add a HappensBeforeSpan annotation to current.

Idea B

We could instead follow the X-Trace model more directly and create something like a ParentType span annotation that could be set to NEXT instead of the default, DOWN. Something like that.

Thoughts?

Dec 03 '15 01:12 bhs

Meta-comment: on RFC's let's use real things!

Ex. we learned from zipkin that people rarely understood anything. How about revamping your example to use things tons of people understand.

Ex. instead of client-span -> hop1 -> hop2 -> server

browser -> elastic load balancer -> ha proxy -> tomcat

you can then refer to these in your example. It will help, as you can establish common ground with people who are not used to Span jargon, yet :) Ex. X-Forwarded-For` can help guide discussion.

One thing I learned in zipkin is that few know how social company RPC stuff, like finagle or autobahn work, so using these as examples, actually create cognitive distance rather than shortening it. Favoring the "EC2 crowd" will lower the barrier to entry in discussions like this from folks who already know zipkin etc to a wider amount of those who were left behind.

/me ends meta comment

for real comment, too distracted to think about a solution deeply right now, except this looks like a quite valid concern.

Dec 03 '15 02:12 codefromthecrypt

@bensigelman Was thinking a lot about it lately. The x-trace's pushDown/pushNext (I assume we're talking about this paper) doesn't make sense to me as it was presented, e.g. in Fig.2 they show top-left node doing both pushDown and pushNext, as if it's doing two different transmissions or encodes two different trace contexts in the single transmission, one per logical layer.

How about a HappensAfterSpan and/or HappensBeforeSpan tag? The values would be span_ids or similar.

I think something like this is doable, although the first span that should be tagged with HappensBeforeSpan cannot know the next span ID. But it does know that it's fully finished before the next span starts. So it can emit an equivalent of pushNext annotation (or "finish-to-start" for people familiar with MS Project dependencies), and preserve its own span ID in the trace context so that the next sibling can can emit HappensAfterSpan=sid, yet still register itself as a child of the original parent span.

So the remaining question is how we want to capture this in the API.

Dec 21 '15 17:12 yurishkuro

Re the X-Trace paper: my understanding was that the top-left operation would record just the one piece of metadata, and that the pushNext and pushDown therein would be considered their own "operations" with their own context to log. I.e., the number of contexts in a dapper/zipkin model is not 1:1 with the number of contexts in an equivalent X-Trace model.

In any case, the most important question is the one you end with: how best to represent this in the programmer-facing API? The safest thing (IMO) would be to start with a lower-level API and leave it at that until we have greater evidence around the particular data model. By "lower-level," I mean just some simple function calls that abstract away the particular names we choose for "Happens-Before" tag keys, etc.

While I consider this topic an important one in the long-term, I don't want it to stumble into a lot of complexity that distracts us from the more pressing matter of getting publishable APIs out in go+py+js+java (or whatever else we decide to priority early on). Thoughts?

Dec 25 '15 21:12 bhs

Some background on zipkin.

This scenario is supported by the shared span model. Ex. in zipkin, multiple endpoints participate in the same span. This allows you to see the server and client on the same line. This also allows you to see any proxies in the same line.

Here's an example:

span [
{time 0, "cs", source},
{time 1, "firewall applied", proxy},
{time 2, "sr", destination},
...
]

A decision to squash proxies is highly subjective aka policy. A presentation layer could be taught to collapse proxies with the same span id via some policy? In zipkin, the "real" destination is annotated as a tag "sa". Using this, you could implement a policy to squash hops between the client and the server. Would something like this not work?

On the happens-before question (relating to clock skew), seems a separate albeit related issue.

Dec 30 '15 02:12 codefromthecrypt

Also, assuming we aren't doing shared spans (which is ok by me), we could still make a type for proxies similar to this. That also would allow presentation tier to choose to squash them without larger model changes.. thoughts? https://cloud.google.com/cloud-trace/api/reference/rest/v1/projects.traces#SpanKind

Dec 30 '15 02:12 codefromthecrypt

@adriancole I agree completely that RPCs can and probably should be rendered as a single row in a conventional zipkin/dapper-style UI... yet from a data modeling perspective there is still a strong case to be made for multiple spans per RPC. And the SpanKind concept could work well.

PS: I don't think (?) the dapper paper addressed this, but in older versions of stubby (google's RPC subsystem) there were sometimes user-space queuing issues in high-throughput processes... as such, the trace UI showed server time, the full end-to-end client time, but also the queueing delay on both client and server sides which were sometimes significant in terms of the global critical path. I would suggest we model things like those enqueue/dequeue events via Logs in the opentracing data model.

Dec 30 '15 03:12 bhs

@adriancole capturing firewall hop as a Log in the shared span doesn't seem useful due to the clock skew. The only thing it tells is "yep, we pass through the firewall", as the timestamp cannot be reasoned about without a lot of additional alignment logic. We (at Uber) decided to model proxy/router hops, such as haproxy, as nested spans (per initial post). It makes Dependencies graph job a bit harder, but not impossible since it just needs to know that "haproxy" service is a middleware and treat it as pass-through for the purpose of service-to-service dependency derivation. I haven't got around to implementing it yet, there will be a patch to the zipkin-dependencies.

Agree on the SpanKind, I've already ran into needing it when using OpenTracing API.

I suggest we keep this issue open until we have a good proposal for the happens-before use case, it's the one I primarily had in mind. I think we have a general idea, just need to come up with a concrete proposal. I agree with @bensigelman that it's not a very pressing issue.

Dec 30 '15 05:12 yurishkuro

Yuri, what you've said makes sense.

How about we rename this issue? What confused me was that i misunderstood your goal. I thought it was to collapse middleware. it seems we are most interested in sequence, aka happens before, right? Let's make this the issue title since you don't want this closed until we have support for that

Dec 30 '15 05:12 codefromthecrypt

This topic of happens-before is one that circles quite often, and usually Yuri makes comments.

If we are to re-purpose this issue to solve that, we'd be best using the context we've collected here:

Scroll to Frame granularity and Sequencing (aka Local Spans) https://docs.google.com/document/d/1ixxEs9TvhiGjJObGbRSPhSna3zHdadoUTQIZ5JKgLzU/edit#heading=h.wkls421pevch

Dec 30 '15 06:12 codefromthecrypt

suppose another way to address this is to add a task list https://github.com/blog/1375%0A-task-lists-in-gfm-issues-pulls-comments to this issue with the dependencies before closing. Then, open top-level issues relating to that checklistl.

Ex. we've at least sequencing, if not typing (SpanKind), right? Then, once all the dependencies are solved, we can have a concrete answer to the hide proxy thing (which I agree is useful), and also have transparency into what we need to solve.

sg?

Dec 30 '15 06:12 codefromthecrypt

@yurishkuro Has this issue been fixed because I noticed that my middleware authentication and authorization spans seem to finish only at the end of a trace with subsequent spans visible as a subset though they are sibling and not child spans.

Dec 12 '18 06:12 SwarnimRaj

it has not been fixed. It should also be moved to the Specification repo.

Dec 13 '18 08:12 yurishkuro

Joining this conversation from https://github.com/jaegertracing/jaeger-ui/issues/390 - it looks like a solution is needed to express this kind of sequence/sibling relationship in order for visualizers like Jaeger-UI to reduce staircasing.

The discussion here is quite old - how much is still true, and what needs to happen next? @adriancole mentioned defining a to-do list with the dependencies, but what actually are those dependencies right now?

Jun 11 '19 10:06 richard-fine

specification specification copied to clipboard

How to record casual relationships / sequencing between sibling spans

Nested spans

Stacked sibling spans

Idea A

Idea B

specification
specification copied to clipboard