opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

Turn off spans from a scope while retaining downstream spans and without breaking traces

Open jack-berg opened this issue 1 year ago • 4 comments

Suppose you have a span hierarchy of ({tracerName}:{spanKind}:{spanName}):

tracerA:server:parent -> tracerB:internal:child -> tracerC:client:grandchild

And you decide that internal spans from tracerB are too noisy. You want to turn them off while:

  1. Not losing downstream spans. You want to keep tracerC:client:grandchild because it contains important information about calls to downstream systems.
  2. Not producing broken traces. tracerC:client:grandchild's parent should be tracerA:server:parent, not tracerB:internal:child.

How can you accomplish this? Well, you pretty much can't. A couple of problems prevent it:

  • Samplers dont have access to scope, so there's no SDK configuration mechanism which can do this.
  • Even if Samplers could make decisions based on scope, the sampling decision options don't make the desired behavior easy.
    • If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn} for other spans: Then all descendants of tracerB:internal:child will be dropped as well. Not what we want.
    • If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn,localParentNotSampled=AlwaysOn} for other spans: Then we end up with a broken trace since tracerC:client:grandchild will assign its parent to be tracerB:internal:child, which is dropped.
    • If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn,localParentNotSampled=AlwaysOn} and instrumentation is aware and only sets spans in context if Span.isRecording()=true: Then we achieve the desired affect. tracerB:internal:child is dropped and tracerC:client:grandchild will assign its parent to be tracerA:server:parent, so we have a complete trace.
  • Even if Samplers could make decisions based on scope, its cumbersome to configure a sampler to have special rules for a scope. It's simpler to think of a sampling policy as orthogonal to the decision of which scopes / instrumentation libraries should be enabled.

So there is no way to get the behavior we want today. And even if samplers were given access to scope, we'd still need: a complex / non-standard sampling policy and coordination with instrumentation (i.e. checking Span.isRecording()==true). Ouch.

jack-berg avatar Feb 05 '24 19:02 jack-berg

Is there a tracking specification issue for this?

jkwatson avatar Feb 05 '24 23:02 jkwatson

I am going to try to solve this at the spec level. This problem is probably discussed in different terms in some open spec issues, but maybe not all in one place in clear terms. Will do some searching and create a new issue if needed, and link back to this issue in either case.

I wanted to create a java specific issue because it came up in the 2/1/24 java SIG meeting.

jack-berg avatar Feb 05 '24 23:02 jack-berg

Not producing broken traces. tracerC:client:grandchild's parent should be tracerA:server:parent, not tracerB:internal:child.

Something that hadn't occurred to me until reading this just now, is that maybe there's an argument to be made that the absence of tracerB:internal:child makes the trace broken, because it appears that parent directly interacts with the grandchild. Maybe it's easy to understand in this specific case (especially because the parent is internal), but you could imagine future scenarios where a trace could be very confusing if significant portions of the ancestry are absent.

"Like, how did execution go all the way from A all the way down to Z?"

I don't intend this to be in any way discouraging or blocking, just an aspect of "correctness" I hadn't considered before and hadn't yet heard discussed.

breedx-splk avatar Feb 06 '24 00:02 breedx-splk

@breedx-splk I think what your describing influenced the design of the samplers up to this point. Given that there's no way to produce this behavior today with SDK configuration, its safe to say that a high priority has been given to avoiding the potential confusion you describe. The obvious counter argument (which needs to be made anyway) is that just because some people would be confused by missing spans doesn't mean that others wouldn't prefer the option to omit certain spans. In other words, it should be possible to accommodate a variety of configurations.

jack-berg avatar Feb 07 '24 14:02 jack-berg