specification icon indicating copy to clipboard operation
specification copied to clipboard

Standard(s) for in-process propagation

Open yurishkuro opened this issue 9 years ago • 80 comments

OpenTracing 1.x leaves the question of in-process context propagation entirely in the hands of the applications themselves, which substantially reduces the usefulness of the standard since different frameworks instrumented with OT do not have a way to exchange context.

Standards for in-process context propagation inevitably have to be specific to individual languages. This is more of a placeholder issue. It is also somewhat similar to #22 (possible part of it), but can be done independently.

yurishkuro avatar Dec 04 '16 01:12 yurishkuro

In-process propagation, and support different frameworks instrumented, seems very hard, as I known. And different platforms have different characteristics.

In Java, same Context in ThreadLocal, and provide inter-thread propagation spec, maybe a effective way.

Keep me in the loop. :)

wu-sheng avatar Dec 04 '16 04:12 wu-sheng

Last july @Xylus and @eirslett did a couple day spike on context. Assets from those meetings are here; https://drive.google.com/drive/u/1/folders/0B0tSnQT3uGdAfndwS3RKVnMtM0ZKMWU4R1dicTNEXy1NSlBJaTlJQ05abGVneTlucW5HdFk

Eirik ended up spiking an effort towards that later, https://github.com/DistributedTracing/continuation-local-storage-jvm/pull/1

Ironically, last december, iirc, distributed context propagation was the first name of the OpenTracing group, so not too surprising it is here. Since then, I think uber had some work towards it naming the group https://github.com/openctx

More recently, recently google are changing their own context to decouple storage from it. https://github.com/grpc/grpc-java/pull/2461 Also, other tools like Log4J2 have added hooks to interop with other's context storage engines.

As mentioned in another issue, there's some layering concern, for example if tracing is the right place to define context (as they are often multi-purpose and below tracing). That said, whatever hooks there are tracing is at least a primary consumer, and interesting to see a tracking issue arise.

codefromthecrypt avatar Dec 04 '16 04:12 codefromthecrypt

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation. For example, TChannel (which is a form of RPC framework) can be configured with a TracingContext, as a way of abstracting the actual mechanism of in-process propagation, but the mere fact that a context object is not passed to RPC methods but instead extracted via that abstraction essentially means the abstraction must be implemented via thread-locals.

NB: the TracingContext in Jaeger and TChannel is to a Span what gRPC Storage is to a Context. I.e. the same separation of how something is propagated vs. what is being propagated. They are just smaller in scope and only propagate a single Span.

yurishkuro avatar Dec 04 '16 05:12 yurishkuro

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation.

interesting point for JVM tech. yeah different types of thread locals, wrappers of thread locals, or thread-local references to something else.. all have thread locals in there somewhere. Maybe there's some subtle (or native) option I'm also unaware of. @raphw @tmontgomery you know magic ( or at least where bodies are often buried ) .. do you know something available on JVM that can be used to store incidental trace state which isn't directly or indirectly a thread local?

codefromthecrypt avatar Dec 04 '16 05:12 codefromthecrypt

As a matter of fact, I have written a small thread-local-ish utility for this exact purpose. The problem with thread locals is that it only allows access from within a Thread. It is not possible to set an object from without a Thread even if the latter is known.

Also, I have started migrating away from the above DetachedThreadLocal and rather use the WeakConcurrentMap. The latter allows to define a form of key object that represents a state. This state is automatically garbage collected once the key is eligible for collection. Propagation works by sharing such keys where the context is removed once a context has become obsolete.

All in all, the problem always boils down to managing a form of global-state what is always tedious.

raphw avatar Dec 05 '16 10:12 raphw

@adriancole what is incidental trace state? Why should be stored in something like threadlocal?

wu-sheng avatar Dec 05 '16 14:12 wu-sheng

To all, threadlocal can't solve all problems, somthing like async process model will bring us more.

For example, Disruptor is a very useful in-memory async lib. If application use this, all context base on threadlocal will fail without doubt.

This is very important for my exp.

wu-sheng avatar Dec 05 '16 14:12 wu-sheng

@adriancole https://github.com/adriancole what is incidental trace state? Why should be stored in something like threadlocal?

sorry poor choice of words. Most users aren't aware of trace state as it is implicitly passed around on their behalf (of course some are). The usual implicit mechanism in Java is a thread local, though it is problematic in async libraries. For example, usually you have to manually re-attach the state when some accident of thread scheduling loses it. Ex the below from Brave + RxJava

public Action0 onSchedule(final Action0 action) { final ServerSpanThreadBinder binder = brave.serverSpanThreadBinder(); final ServerSpan span = binder.getCurrentServerSpan(); return new Action0() { @Override public void call() { binder.setCurrentSpan(span); action.call(); } }; } https://github.com/smoketurner/dropwizard-zipkin/blob/master/zipkin-rx/src/main/java/com/smoketurner/dropwizard/zipkin/rx/BraveRxJavaSchedulersHook.java

One of the reasons I asked about alternatives to thread locals was to figure out what current practices exist, as it is helpful to scope what might end up in a spec or library here or elsewhere.

make sense?

codefromthecrypt avatar Dec 05 '16 14:12 codefromthecrypt

@adriancole, yes, that is my concern.

wu-sheng avatar Dec 05 '16 14:12 wu-sheng

But I think only change threadlocal to something similar is not explicit enough to someone, who is not a tracer developer.

Provide some APIs, like carrier, inject, extract for rpc, are better choice for me. They will be easy to understand.

wu-sheng avatar Dec 05 '16 15:12 wu-sheng

In terms of the specification, the more interesting question for me is the contract of this API:

  • Should it be possible to store just one span in the context?
    • This would probably be e.g. the RPC server span and all lower-level components would be child spans of this one span without any nesting.
  • Should it be implemented as a stack?
    • How do you know on POP that you got "your" span? (in case a child component failed to close/pop its span or called pop too often)
    • How does a component know that it is the outermost component and that it should close any open spans. Example: The RPC server creates a "root span" and adds it to the stack, a lower-level component (e.g. a DB access library) opens a child span but the code failed with an exception. The request exception handler now needs to know that the DB span must be closed.
  • Should it be implemented as a list or a tree to allow for more reference types?

It might be easier to decide which language-specific storage construct might be the right one, when we know which features we want it to support.

cwe1ss avatar Dec 06 '16 13:12 cwe1ss

I am creating a library called 'context-propagation' meant purely to propagate any (threadlocal) context implementation from a parent-thread to any background thread in a stack-like fashion, using the Autocloseable try-with-resources mechanism to allow verified stack unrolling (and automatic skipping if a pop is missed). It revolves around a Context and ContextManager interface (each with only two methods) and a utility to create a snapshot from all known ContextManager implementations at once. If you're interested, please take a look here and let me know what you think: https://github.com/talsma-ict/context-propagation

(we needed it for security propagation, but are now looking into adding opentracing and would like to re-use the mechanism that is currently in-place that serves us well).

sjoerdtalsma avatar Dec 06 '16 19:12 sjoerdtalsma

@cwe1ss I think the only requirement is "I need to get the current span". The fact that Java implementation often uses stack is a side-effect of thread-local implementation, since you're using the same API to retrieve different spans at different times, and stack is the only thing that makes sense for that.

yurishkuro avatar Dec 06 '16 19:12 yurishkuro

@yurishkuro there's also the logical requirement to "set the current span". Do you think this is implicit - meaning that every span that is created is automatically "the current span" or do you see this as an explicit feature/step?

In other words, do you think that the context should be part of the tracer interface in which a call to StartSpan would automatically store the span in the context and something new like tracer.CurrentSpan would return it from the context? Or would you prefer a new interface like traceContext that must be called explicitly by the user after a span has been created - e.g. traceContext.SetCurrentSpan and traceContext.GetCurrentSpan?

The former would make for a nice user-friendly API and the latter seems to be closer to what we have now in jaeger-context and my C# contrib library. But both have advantages and disadvantages that need further discussions of course.

cwe1ss avatar Dec 06 '16 21:12 cwe1ss

Sorry, you're right, it needs get and set fort current span.

yurishkuro avatar Dec 06 '16 22:12 yurishkuro

@sjoerdtalsma thanks for sharing the context propagation lib... nice to see something that's already in use and not pre-coupled to a distributed tracing system per se.

@yurishkuro @cwe1ss I am prototyping a few approaches to make this pluggable... I would like OT to support in-process context managers without getting into the business of doing all of the work (since there are so many approaches and each has its advantages/disadvantages).

@cwe1ss as far as the stack concept is concerned: in Dapper we basically had each in-process Context object keep a reference to its parent Context; when finish() was called (or its moral equivalent... the naming was different), the Context would re-install its parent Context in the thread-local storage. So there was essentially a linked list of Context objects within the process.

bhs avatar Dec 07 '16 05:12 bhs

@bensigelman For what it's worth, I heavily depend on the the java.util.ServiceLoader to make my library pluggable. The abstract threadlocal implementation has the same 'linked stack' structure, except it tries to correct out-of-order closing as well.

Details on the out-of-order closing: If the closed context isn't 'current', current isn't touched. While the parent is already-closed, it is 'skipped' in the search for the restored 'current'. I am aware this should never happen in properly nested try-with-resources Autocloseable contexts.

Later today I'll be committing a 'java-globaltracer' project to github and planning to offer it to opentracing-contrib. There will be three main classes with only a single public class GlobalTracer (currently with 3 methods) following this design: http://bit.ly/2gkrRuo

sjoerdtalsma avatar Dec 07 '16 07:12 sjoerdtalsma

@sjoerdtalsma , can you post the site of java-globaltracer project. I have suggest earlier in https://github.com/opentracing/opentracing-java/issues/63. ServiceLoader to get a tracer maybe a good way. Even though, some tracer need arguments to init, like Jaeger Tracer by @yurishkuro told, they are not suitable for the module.

But I like this type of java tracer factory/manager very much.

wu-sheng avatar Dec 08 '16 06:12 wu-sheng

@wu-sheng @adriancole , I pushed an initial attempt (with a working unit test using the MockTracer) to github yesterday evening: https://github.com/talsma-ict/java-globaltracer

I would love to donate it to the 'contrib' git repository and have it published in Maven central under 'io.opentracing.contrib' but only after a thorough review process. Since I'm relatively new to the opentracing community I have no experience on the best way to go forward from here...

sjoerdtalsma avatar Dec 08 '16 08:12 sjoerdtalsma

@sjoerdtalsma @adriancole , maybe we should create a new issue to discuss this?

I review the project generally, let's leave the detail discuss later.

But at first, jdk level should be fixed. Using Optional<T> led to needing jdk 1.8+, but many applications based on jdk 1.6+. I think you should implement your design on jdk 1.6, maybe the code is less grace, but it will easier to use.

wu-sheng avatar Dec 08 '16 09:12 wu-sheng

@sjoerdtalsma https://github.com/sjoerdtalsma @adriancole https://github.com/adriancole , maybe we should create a new issue to discuss this?

One of the reasons we discussed thread local was yuri's question of whether we can assume thread locals are used or not. as far as I can tell, we cannot make that assumption. If that steers the api in a particular way, then the examples were fruitful.

I agree that it is probably helpful to keep weeds or investigations on different threads as doing so here might be overwhelming. probably best in the actual repo of the code in question?

codefromthecrypt avatar Dec 08 '16 09:12 codefromthecrypt

@wu-sheng @adriancole I welcome feedback on the design and implementation choices by github issues. https://github.com/talsma-ict/java-globaltracer I'm also willing to transfer this repository to https://github.com/opentracing-contrib if you think this contribution is worthwhile.

sjoerdtalsma avatar Dec 08 '16 09:12 sjoerdtalsma

@sjoerdtalsma , transfer to https://github.com/opentracing-contrib, LGTM. @yurishkuro @adriancole @bensigelman , what's your opinion?

Maybe accept it, discuss on the repo, and let's see what is going on later?

wu-sheng avatar Dec 08 '16 10:12 wu-sheng

@sjoerdtalsma my preference would be to

  1. create (empty, ASF2.0) repos on opentracing-contrib for the pieces you'd like to move over
  2. do a PR for the initial contents where we discuss the finer points of the code itself (and I would def rather not try to do that all here in this thread!)
  3. merge those PRs as soon as they're ready, etc.

bhs avatar Dec 08 '16 19:12 bhs

@bensigelman , can you add me to the https://github.com/opentracing-contrib org. I am interested in @sjoerdtalsma's project.

wu-sheng avatar Dec 09 '16 01:12 wu-sheng

@sjoerdtalsma , let's me know, after you create a repo, and do a PR

wu-sheng avatar Dec 09 '16 01:12 wu-sheng

@bensigelman I agree on all three of your points. re. 1: I cannot create this repo in opentracing-contrib, so it would be great if anyone with the power to do so could create a new ASF2.0 repo called 'java-globaltracer' or something similar. I'll take care of step 2 from there.

sjoerdtalsma avatar Dec 09 '16 08:12 sjoerdtalsma

@bensigelman , maybe you should create a repo for @sjoerdtalsma

wu-sheng avatar Dec 09 '16 09:12 wu-sheng

@sjoerdtalsma: https://github.com/opentracing-contrib/java-globaltracer/invitations gives you admin permissions... maybe we can start with a PR of the existing code?

bhs avatar Dec 09 '16 17:12 bhs

(See also: https://github.com/opentracing-contrib/java-globaltracer/pull/1)

bhs avatar Dec 11 '16 23:12 bhs