opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Sync and Async children (FOLLOWS_FROM)

Open tedsuo opened this issue 5 years ago • 28 comments

In OpenTracing, we have CHILD_OF and FOLLOWS_FROM. In the new project, we are considering whether to include this concept as a flag on the SpanBuilder option when setting the span parent. The new naming is proposed to be sync and async children, to make the relationship more clear.

Reference PR: https://github.com/bogdandrutu/openconsensus/pull/130

Questions:

  • Do we still want this at all? It can be useful for critical path and other types of trace analysis.
  • Do we also need an unknown flag as well?

tedsuo avatar Apr 22 '19 16:04 tedsuo

@tedsuo How would you label server span with a parent propagated from an http request?

Perhaps a better name would be direct/indirect? (Anything pulled directly from the current context would be direct otherwise indirect)

Perhaps that wouldn't give you the info required for critical path analysis you need... part of the problem is only the caller really knows if they're blocking, where the link/relationship is established by the callee.

{I guess some of this was already discussed in #14}

tylerbenson avatar May 02 '19 21:05 tylerbenson

I don't think sync and async are the correct semantics for this. My understanding of ChildOf and FollowsFrom is that ChildOf is the previous operation that directly created the new span, and FollowsFrom is any previous operation that indirectly caused the new span. This is unrelated to the references being asynchronous or not. Both potentially have value but they are conceptually different.

An example of this in JavaScript is promises. The ChildOf would generally be the span in the scope where promise.then() was called, and FollowsFrom would be the span where resolve() was called for example.

My take on the questions above:

Do we still want this at all? It can be useful for critical path and other types of trace analysis.

I think both ChildOf/FollowsFrom and sync/async potentially have value for different reasons.

Do we also need an unknown flag as well?

What is the case where this would not be known? I think this is something that is always known in advance.

rochdev avatar May 22 '19 16:05 rochdev

@rochdev Hey, sorry for the late answer.

Agreed with what you wrote - but what names you think we should use? @tylerbenson already mentioned direct/indirect as options, and if you have something in mind feel free to propose it.

carlosalberto avatar Jun 03 '19 21:06 carlosalberto

I think ChildOf and FollowsFrom made a lot of sense. To be honest, semantic-wise I think OpenTracing got a lot of things right.

How was this relationship called in OpenCensus?

rochdev avatar Jun 03 '19 22:06 rochdev

@rochdev I am worried that the understanding that of childOf and followsFrom is different for you than for others. I think what you explained is different than what others explained to me, I am very confused now about what is the correct meaning of childOf vs followsFrom.

bogdandrutu avatar Jun 03 '19 22:06 bogdandrutu

Moving to the API revision milestone on specification. We need more feedback collected

SergeyKanzhelev avatar Jun 03 '19 22:06 SergeyKanzhelev

In Node this concept is very important and is core to how context propagation works in the runtime itself. For this purpose specifically, we call them execution and trigger. In Node you can only have one of each because it's limited to function calls, but from a tracing perspective it makes sense that you could follow from multiple different operations.

Let me give an example specific to Node at the language level:

const promise = new Promise((resolve, reject) => {
  resolve() // execution ID here it 2
})

// execution ID here is 1
promise.then(() => {
  // here, execution ID is 1 and trigger ID is 2
})

The reasoning for the above is that resolve() is what triggered the execution of the callback, but the callback was actually registered when then() was called, so that's its execution parent.

For a case like this, we could then say that the callback is running as a ChildOf the context where then() was called, and FollowsFrom the context where resolve() was called.

It's possible I got this completely wrong and that ChildOf/FollowsFrom has nothing to do with the relationship described above. I think the best person to explain the real meaning of these is probably @tedsuo.

In general, I think the different wordings proposed in this thread make sense, but they don't necessarily map 1:1 with each other.

rochdev avatar Jun 03 '19 23:06 rochdev

The problem here is slightly bigger than what to call these. Kudos @rochdev for thinking OpenTracing got it right, but it didn't, not quite (cf. this blog post). The most fundamental question in analyzing the graph of events is the Lamport's happens-before relationship. In the OpenTracing span model the following holds:

parent.start  happens-before child.start

That's it! Neither child-of nor follows-from imply any further causality. Child-of only means parent depends on the outcome of child, in some way. It doesn't mean the parent is blocked - it can be doing other things (thus sync/async naming isn't quite right). It doesn't mean child completes before parent - it looks this way, but parent (RPC caller) may timeout before child (RPC server). In case of such a timeout, OpenTracing does not have a convention on how parent should record that fact (sad face).

The difference between child-of and follows-from is useful, in practice, for calculating critical path, but strictly speaking that calculation is not possible since the causality is not captured between the ends of spans, so critical path can only be calculated via a heuristic (I would love to be disproven on this!).

Another odd thing about child-of and follows-from is that it's the child span that defines this reference type, even though it talks about parent's dependency on child outcome. If you're a remote server, how do you even know if parent/caller does or does not depend on your outcome? I tend to think of this as the nature of the protocol: producer of a message to Kafka does not respect any response, so the receiver should use follows-from. Sender of HTTP request does expect a response, so the server always uses child-of, even of the sender doesn't care about the outcome - in that case it can internally create a follows-from span first, and then a normal pair of RPC call spans. So it's possible to rationalize this way, but it's still kind of dirty.

Of course, there's always the argument that OpenTelemetry 1.0 is not supposed to improve upon OpenTracing/OpenCensus (convergence is more important than improvements), in which case it doesn't matter much what we call these, because the model would need to be revisited anyway.

yurishkuro avatar Jun 04 '19 00:06 yurishkuro

Thanks for the clarification @yurishkuro! It sounds like this is a larger discussion then. Would it make more sense to wait then instead of implementing this knowing that it's not necessarily the correct way to handle this relationship?

Of course, this depends on whether users are currently depending on the feature. If that's the case, then I think we should get more information about exactly how it's used which would give us a better understanding of how the currently used feature should be called.

rochdev avatar Jun 04 '19 01:06 rochdev

I think a flag for sync / async (blocking / non-blocking) spans would be very useful for trace analysis. In this way you could much easier and quicker identify hot spots along the critical path and, thus, the "root causes" for long trace timings. Such a flag is important because it indicates whether a child span's time is included in the parents time or not. Thus, whether a parent span is slow because of the child span or independently of the child span. Without such a flag, this question cannot be answered.

I agree that in many cases only the caller knows whether a child span is blocking or not, but, in such cases this information could be propagated with the context accordingly, so this information could be added at the child span's side.

AlexanderWert avatar Sep 09 '19 05:09 AlexanderWert

To fully capture all (or at least more) of the possible relationships between Spans, in addition to the create(parent) API, we would need APIs that signal that a span begins/ends waiting for a particular (set of) child(ren) and possibly even that it consumes the result of a particular child. Of course we would need to identify a child on the parent side without the child communicating back it's span ID, which is a whole other problem (but can probably be solved elegantly by introducing IDs not only for the nodes but also the edges in the span graph). Heck, you can even wait synchronously (occupying a thread) or asynchronously (by using something like async/await where other operations can be scheduled while a different one waits for I/O).

To show some difficult cases (pseudo C#) :

// Start async (the child has no idea whether we used a blocking API or an async
// one -- nor should it have to know).
var request = myClient.GetAsync("http://example.com/myAPI"); 
var myCalcResult = /* some expensive calculation */; // Could be its own subspan

// Block for child request, but not indefinitely
var maybeResult = await request.withTimeout(500);
if (maybeResult.HasValue) {
  renderCompletion(myCalcResult, maybeResult.Value); // Consume result
} else {
  renderPleaseWait();

  // Another operation/span will consume the result (the handle-response part of the
  // client could become a child of the server Span, or it could be an independent
  // root span with a CONSUMES relation to the server span).
  delayedRequests.AddPending(new PendingInfo(myCalcResult, request)); 
}



Oberon00 avatar Sep 09 '19 11:09 Oberon00

Moving to v0.3

SergeyKanzhelev avatar Oct 03 '19 05:10 SergeyKanzhelev

Closing this as being accomplished through Links in the current spec.

jmacd avatar Jan 22 '20 16:01 jmacd

Links don't address the issue in this ticket. I suggest to close #86 instead, because there's more discussions here.

yurishkuro avatar Jan 22 '20 17:01 yurishkuro

Links do not solve the problems discussed here (e.g. only the parent having the info whether it waits for the result, hence whether this is a sync/async call).

Oberon00 avatar Jan 22 '20 17:01 Oberon00

Please explain why the SpanKind field does not satisfy this?

jmacd avatar Jan 22 '20 18:01 jmacd

For example, it does not allow establishing child-of/follows-from like relationship between internal spans (because SpanKind is currently overloaded).

yurishkuro avatar Jan 22 '20 21:01 yurishkuro

Also, SpanKind is not an attribute of Links.

yurishkuro avatar Jan 22 '20 21:01 yurishkuro

Any update on this ?

I think this integration will reduce frustration for people coming from opentracing and it will avoid unnecessary OpenTelemetry customization by the user.

For asynchronous processing where Kafka, RabbitMq, etc.. solutions are involved, it will be nice to have FOLLOWS_FROM span.

Capture d’écran, le 2020-03-06 à 13 54 01

OlivierAlbertini avatar Mar 06 '20 19:03 OlivierAlbertini

I'm noticing some Merged changes related to this issue, but I'm not seeing the actual work merged into any release yet. What's the ETA on getting FOLLOWS FROM functionality built into OpenTelemetry?

I have a NATS producer and want to continue to propagate the trace information in the NATS consumer, just like @OlivierAlbertini mentioned above.

jon-whit avatar May 27 '20 20:05 jon-whit

Any update on this? This would be useful for Messaging scenarios as mentioned above. So when the Producer writes messages to the queue the linked messages would be CHILD_OF the producer Span. And on the Consumer when it reads the messages from the Queue the links with the messages would be FOLLOWS_FROM.

Kanwaldeep avatar Jun 08 '20 16:06 Kanwaldeep

I have the feeling there are not enough people who want this strongly enough. If you want to get this rolling, you should probably make a spec PR or OTEP along with a prototype implementation PR e.g. to opentelemetry-java or whatever your preferred language is (best to do this end-to-end from the API to the Jaeger exporter which, I assume, supports this span relationship). EDIT: Or try to bring it up in tomorrows SIG spec Zoom meeting to see what others think.

Oberon00 avatar Jun 08 '20 16:06 Oberon00

Duplicate of #562

bogdandrutu avatar Jul 21 '20 15:07 bogdandrutu

There is a lot of discussion in this ticket, while #562 is very short and looks more like an implementation detail than a spec question. I don't think they are duplicates, and even if they are I would keep this one for context.

yurishkuro avatar Jul 21 '20 16:07 yurishkuro

reopening to not lose it.

yurishkuro avatar Jul 21 '20 16:07 yurishkuro

After giving this a try via #906, we decided to postpone it in order to develop it properly. Re-labeling it so we can add this feature after GA.

carlosalberto avatar Sep 02 '20 16:09 carlosalberto

any update on this?

dotNetDR avatar Apr 18 '22 06:04 dotNetDR

+1

h1z3y3 avatar Apr 25 '22 13:04 h1z3y3

Solved via span links.

austinlparker avatar Apr 23 '24 20:04 austinlparker