opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

B3 Propagator can't properly handle missing `X-B3-Sampled` header

Open jkwatson opened this issue 3 years ago • 6 comments

If the X-B3-Sampled header is missing, the propagator is considering this as "not sampled" when creating the resulting TraceFlags. This will cause Samplers to then consider the parent as "not sampled" when actually B3 says that this should "leave the decision to the server".

I think this might be a bigger issue, however. When propagation is happening in Otel, we only have 2 options: sampled & not-sampled. We don't have a value that represents "you decide".

This might be an issue for the specs and interop with B3.

jkwatson avatar Aug 19 '21 15:08 jkwatson

Hi @jkwatson, there's another issue with B3 propagation and the X-B3-Sampled header: In the OpenZipkin B3 Propagation Spec it says

Sampling state is almost always sent with Identifiers, but it can be sent alone (a predefined decision).

With the OTel Java SDK, the sampling decision is only evaluated if both Trace and Span IDs can be extracted from B3 headers. Thus, OTel-Java is oblivious to external sampling decisions from systems that do not themselves partake in tracing.

https://github.com/open-telemetry/opentelemetry-java/blob/3518c98c457516374f1f5f3eacc0d7133cd67908/extensions/trace-propagators/src/main/java/io/opentelemetry/extension/trace/propagation/B3PropagatorExtractorMultipleHeaders.java#L33-L59

The main use case where I would like to only send the X-B3-Sampled header is to suppress trace creation for monitoring endpoints (think Kubernetes liveness and readiness probes). For now, we're able to work around this by sending 0-padded 1 trace and span IDs (traceid: 00000000000000000000000000000001, spanid: 0000000000000001).

flangknecht avatar Jan 17 '22 08:01 flangknecht

Thanks for bringing this up @flangknecht - I agree that we should respect the sampling-decision-only case. Unfortunately our code is quite entangled to the invalid span == no parent span, starting a new trace, so using 0 IDs to represent this would be tough in the SDK too. The only approach I can think of that doesn't cause these spans to participate in other areas, for example logs / exemplar injection (by using valid trace IDs like 000001) is to have a special SpanContext that is represented as invalid, except when making a parenting decision in SdkSpanBuilder here

https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/SdkSpanBuilder.java#L173

Does anyone else have any ideas on this?

anuraaga avatar Jan 17 '22 08:01 anuraaga

Well, exemplar I think already checks whether a span is sampled or not I think. Perhaps it's our logs injection logic which is a bit off, should we be checking sampled instead of invalid?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/log4j/log4j-context-data/log4j-context-data-2.16/library-autoconfigure/src/main/java/io/opentelemetry/instrumentation/log4j/v2_16/OpenTelemetryContextDataProvider.java#L34

anuraaga avatar Jan 17 '22 08:01 anuraaga

As the original issue stated, I think this is a gap in otel's general interop with B3, and needs to be sorted in the specification, so it can be done uniformly across all the languages.

jkwatson avatar Jan 17 '22 18:01 jkwatson

I think a missing header delegating a sampling decision is a totally foreign concept and definitely needs to be in the spec.

An explicit zero is just a compression mechanism though so it seems like it should be simpler. But I guess it's true the main trickiness is treating no span in context identically to invalid span in context, and guess we need to see how to deal with that

anuraaga avatar Jan 18 '22 11:01 anuraaga

We hit the same issue.

From the library's point of view, can you atleast provide a way for users to over-ride the default behavior (of setting it to 0 = not sampled), so that we can set it to an appropriate value ourselves that fits our infra setup. I think it will be very helpful for people inter-operating (and transitioning slowly from b3 -> w3c) and solve compatibility issues in a company with tons of microservices on different versions of spring-boot for e.g.

E.g. the proxies usually generates the span/trace ids, but does not send sampling flag to the server. In this case, since a proxy is a given for us, this results into NO tracing at all for us (by default) when we want to support B3 in the input.

I have created an issue in spring-boot project which also has a sample project for re-producing the issue - if needed. https://github.com/spring-projects/spring-boot/issues/37994

dipsk2 avatar Oct 24 '23 11:10 dipsk2