specification How can a tracer append data to an existing span, i.e. span ID that was created in another process

Issue by yurishkuro Friday Jan 15, 2016 at 15:43 GMT Originally opened as https://github.com/opentracing/opentracing.io/issues/22

A use case we're working on right now is we want haproxy to add Logs to the current trace that's passing through it. We already have something in place that can sniff headers from passing requests and write the out to kafka along with some timers. The plan is to have a custom reader of that kafka topic that will extract the timestamps and trace IDs of when the request was passing through haproxy and create additional span Logs that will be saved to Cassandra via somewhat of a back door. It would be nice to have a clean OpenTracing way of doing that, which requires not only the ability to inject the timestamps into Logs externally (per #20), but in our case do that without starting/finishing the span. I know Ben you will be against that last part :-), and say we could just create a new span instead, but doing so would require modifying in-flight request inside haproxy to replace the client trace/span ID with haproxy span ID, and that is much more intrusive way of doing it vs. just adding logs.

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Friday Jan 15, 2016 at 15:46 GMT

@bensigelman said

Oh, just saw your second message: so you basically want to be able to record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

Nov 15 '16 23:11 opentracing-importer

Comment by adriancole Saturday Jan 16, 2016 at 02:42 GMT

@bensigelman https://github.com/bensigelman said

Oh, just saw your second message: so you basically want to be able to record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

I think we are getting quite close to how zipkin (and I guess dapper) worked. Ex. Trace.record (Span.addAnnotation in zipkin) allows unrelated systems, who only have access to the span identifiers, to participate out-of-band. In fact, in zipkin, this is why each log (annotation) had an endpoint, as this was to tell who actually logged it (ex if it was a proxy).

In implementation, finagle prefers to send complete spans, but will still flush "incomplete ones" such as those containing just annotations. https://github.com/twitter/finagle/blob/develop/finagle-zipkin/src/main/scala/com/twitter/finagle/zipkin/thrift/DeadlineSpanMap.scala#L69

In practical terms, I'd expect OpenTracing's Tracer to be used as a partial api, as it may never have all features (ex Trace.clear()), etc. So one way to address this is just to fill features in the zipkin or otherwise tracers, and use deleting that code as a driver for an issue like this. Of course, I'd prefer to be able to account for "unmanaged events" now, as that solves various existing use cases.

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Sunday Jan 31, 2016 at 18:49 GMT

@yurishkuro I'm going to rename this to be more general, hope that's ok. rename it back if not.

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Sunday Jan 31, 2016 at 20:26 GMT

@bensigelman don't mind the title change, they are both applicable and likely to be resolved by the same API.

I think one of the reasons Zipkin did not have the timestamp & duration in the spans is because of its backend ability to receive multiple submissions of the same span ID and aggregate events from all of them into a single timeline, which was then used to infer startTs and duration. This is what this ticket is really about. We chose a simpler API in OT for reporting spans via start/finish methods, which is great as it covers the majority of use cases, but for these less frequent cases it doesn't work and we need something else. It's not necessarily "past spans" as the routing layer may be emitting an event into the ongoing span ID, but both can be solved with the same API.

BTW, did Dapper require at most one process per span, the same way that htrace does? If so, I'm curious how it dealt with the ^^^ use cases.

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Sunday Jan 31, 2016 at 21:41 GMT

Dapper used two processes for each RPC span (which I think is regrettable in retrospect)...

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Sunday Jan 31, 2016 at 21:42 GMT

(renamed again!)

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Wednesday Feb 10, 2016 at 05:46 GMT

This is basically the same thing as #20... I will close that one and cross-ref.

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Wednesday Feb 17, 2016 at 22:24 GMT

@yurishkuro close-able now, yes?

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Wednesday Feb 17, 2016 at 22:47 GMT

Let me rename it to the original subject and keep open (and cc @adriancole). I think the use case kind of works for us due to Zipkin specifics.

The scenario is:

Process A calls process B over RPC
- Per Zipkin spec, they both report the same span ID
Process A marks the span with span.kind=RPC_CLIENT
Process B marks the span with span_kind=RPC_SERVER
- The span kind tags cause tracer to logs cs/cr events in A, and sr/ss events in B.

Meanwhile, the actual RPC request goes through haproxy, which is not able to report spans itself, but writes logs that are converted to spans by a separate process H:

H has access to headers and can do extractor.join_trace(timestamp=x)
- per Zipkin convention that creates the same span ID as A and B had
H logs two events, haproxy-in and haproxy-out, with historical timestamps
H calls finish(duration=y) on the span to get it reported

I think this should work (in Zipkin), due to the following "features":

Span from H is treated as local span, and therefore does not log cs/cr/ss/sr events
join_trace returns span with the same ID as on the wire
Zipkin UI merges submitted spans with the same ID, adjusts for clock skew in H and B timestamps, and merges startTime=min(...) and duration=max(...)

I don't think it will work in tracers that assign unique span ID to each process (two spans per RPC).

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Wednesday Feb 17, 2016 at 22:55 GMT

I suppose a tracing impl could accept the span_id (or any other "tracer state") as a tag and use the fully-overloaded version of Tracer.start_span(...) to hard-code everything about said span. Not really in the spirit of "Span Tags" but would get the job done IIUC.

Nov 15 '16 23:11 opentracing-importer

Comment by adriancole Thursday Feb 18, 2016 at 07:14 GMT

@yurishkuro followed you until we got to where H is actually the one finishing the span.

What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration), rather it would add annotations around the "sr" "ss".. I wouldn't say this has to be a "local span", rather a contribution to an existing span. Ex. H is adding two annotations, and needs to send required ids (currently trace_id, span_id, parent_id) to allocate them properly *. The (local)component tag is just a nice to have.

sideline on clock skew: Currently clock skew adjustment works in relation to the "core annotations", so I'm not sure if skew from H would actually .. unskew? solvable probably!

ps see this for maybe not requiring parent_id.

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Thursday Feb 18, 2016 at 12:21 GMT

@adriancole

followed you until we got to where H is actually the one finishing the span. What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration),

well, yes that's the end goal, but OpenTracing API does not have a way to report anything about a span without starting and finishing it. That's why it's hacky - Zipkin will min/max the startTs/duration, so it will effectively "suppress" the fact that H started/finished span, and just treat it as if H simply added two non-core annotations. The fact that a span is "local' by default is important to ensure H does not emit core annotations.

You may be right about the clock skew. We probably need to revisit that thing anyway, to ensure startTs+duration take priority over core annotations in skew calculations.

Nov 15 '16 23:11 opentracing-importer

Comment by adriancole Thursday Feb 18, 2016 at 12:40 GMT

thx for the loopback i-i

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Monday Oct 17, 2016 at 19:25 GMT

@bensigelman coming back to this thread. We have a number of routing services, including haproxy, that need this kind of "add logs to existing span" solution, because injecting a child span between client and server is not always feasible (due to routing service's implementation language), and not always desirable (as it confuses the RPC parent/child view and service dependencies). And there's still no way in OT to attach data to an existing span from another process.

So concrete proposal: what if we had a special type of span reference for this scenario? Don't know the name of it, something like annotationsFor:

spCtx := tracer.Extract(Format.xxx, carrier)
span := tracer.StartSpan("router",
    opentracing.SpanReference(AnnotationsFor, spCtx)
)

Given this reference type, the implementations have a lot of flexibility on how to handle it. They can choose to share spanID or create a unique id, can merge the data into the lead span before or after storage, can handle it differently in the UI, etc.

Nov 15 '16 23:11 opentracing-importer

Comment by bensigelman Wednesday Oct 19, 2016 at 17:16 GMT

@yurishkuro sorry for the delay!

Hm, seems to me that this could be more like tracer.StartSpan("router", opentracing.SpanReference(Identity, spCtx)), which would generate a Span with the same ids/etc as spCtx. (Much like what jaeger / "zipkin classic" do for RPC server spans)

Thoughts?

If we wanted something more in the spirit of AnnotationsFor, I would probably suggest opentracing.About or something more general than AnnotationsFor, but that's just me.

Nov 15 '16 23:11 opentracing-importer

Comment by yurishkuro Wednesday Oct 19, 2016 at 18:36 GMT

About sounds good. Identity has a bit more confusing connotation - I am not sure I want to have that as a standard ref type. For example, in the classic zipkin model the server side span, even though it shares the span ID with the client span, nonetheless has its own start/end timestamps. With About span I could see tracing system ignoring the start/end timestamps and only using the logs/tags - it's really up to the implementation (specifically the UI).

Nov 15 '16 23:11 opentracing-importer

specification specification copied to clipboard

How can a tracer append data to an existing span, i.e. span ID that was created in another process

specification
specification copied to clipboard