specification icon indicating copy to clipboard operation
specification copied to clipboard

Add a relationship to a span part-way through

Open JonathanMace opened this issue 9 years ago • 4 comments

Current APIs only seem to support adding relationships to a span at construction time.

However, it would be useful to add relationships part-way through a span.

For example, communication between two spans, such as receiving an RPC response, or streaming packets.

This would also enable baggage to be propagated inside communication between two long-running spans.

JonathanMace avatar Feb 21 '17 12:02 JonathanMace

For example, communication between two spans, such as receiving an RPC response, or streaming packets.

The common pattern here is

[ inbound server span  1           ]
   [outbound RPC span  2     ]
- - - - - - - - - - - - - - - - - - -  network
     [inbound sever span 3 ]

with relationships 3 <- childof <- 2 <- childof <- 1.

Could you elaborate when receiving an RPC response requires establishing a new relationship?

yurishkuro avatar Feb 21 '17 14:02 yurishkuro

@JonathanMace

receiving an RPC response

Why receiving need a new span? When A -> B, A's client span can deal with the response, if you want.

wu-sheng avatar Feb 22 '17 02:02 wu-sheng

In short, if a parent span ever blocks on a child completing, then that is a causal relationship from child to parent. Currently, we do not record this relationship (either explicitly, or even as an annotation). At best, it is done with some ad-hoc annotation.

It is necessary to record all causal relationships for two reasons: first is for dependency and critical path analysis; second is for correct back-propagation of baggage.

Dependency and critical path analysis

Currently if a parent blocks on a child we do not capture that information. We do not record in a principled way whether the parent spun off the child asynchronously or if the parent blocked on the child. This is an extremely important causal relationship to record if you care about critical path latency or understanding the dependencies between children within a single span.

For example, Google encountered this in the following two papers: Modeling the Parallel Execution of Black-Box Services, Mann et. al., HotCloud ‘11 and Diagnosing Latency in Multi-Tier Black-Box Services, Ostrowski et. al., LADIS ’11

The figure below, taken from Figure 1 of "Diagnosing Latency in Multi-Tier Black-Box Services", illustrates how if you don't record the child response you get the top; if you were to record the child response you get the bottom.

screen shot 2017-02-13 at 11 17 50 am

There are further examples I can happily share if you want to discuss this one in more detail.

Back-propagation of baggage

OpenTracing only currently supports forward-propagation of Baggage. Forward-propagation is when baggage is copied to child spans. However, if there is a reverse causal dependency from the child back to the parent (ie, the parent blocks on the child), baggage is not propagated back to the parent. Baggage is supposed to follow the causal execution path of requests, so this is a missing relationship.

This means that if you set some baggage in a child, you won't be able to see it in the parent. For example, if a child blocks for a long time on a database, you might store this statistic in the baggage. The parent will block, because the child took a long time, but will not be able to find out why they blocked.

JonathanMace avatar Feb 22 '17 13:02 JonathanMace

Also see HTrace equivalent: https://issues.apache.org/jira/browse/HTRACE-118

JonathanMace avatar Mar 17 '17 19:03 JonathanMace