opentelemetry-specification Conditionally disable context in outbound instrumentation to stop it from reaching third parties

trafficstars

What are you trying to achieve?

In our application we instrument all outgoing HTTP requests to track success rate, duration, and other standard attributes using the opentelemetry-python-contrib instrumentation for requests. We also instrument HTTP requests to internal services using the same library or related ones. I want to avoid sending context to third party servers so that they are not confused by trace state when they also implement opentelemetry, but I want to keep the instrumentation itself, and continue to send context to internal servers.

What did you expect to see?

The instrumentation libraries have environment variable controls for disabling instrumentation on certain paths:

export OTEL_PYTHON_REQUESTS_EXCLUDED_URLS="client/.*/info,healthcheck"

Having some thing similar along the lines of OTEL_EXCLUDE_CONTEXT_URLS would allow us to only send context (and presumedly baggage?) to internal applications and avoid it for third parties.

Additional context.

This first came up when I noticed traces with missing roots from an externally-facing API we run. A client had started implementing opentelemetry, and so their requests come with a trace ID and parent span ID. Our instrumentation makes use of that, and tries to continue that trace. But since the initial spans are solely within the client's cluster, there will never be a root.

This gets even more confusing when there is bidirectional communication.

A worker in system A sends a webhook to system B with context indicating the task execution.
System B continues that trace with their own API request/response. They will be missing the root.
System B then calls system A's API to gather more information. This API call then is linked in the same trace with the initial task, leading to a confusing trace that is seemingly unrelated and has many missing spans from system A.

At this point both system A and system B have a trace with missing information.

There are two sides to this:

A server should not continue a trace originating from outside its system. This can be implemented by having a load balancer or the application itself drop the headers before the instrumentation library makes a new span, although it is annoying to deal with this outside of the instrumentation library.
A client should not send context to continue an internal trace to a third party system. This should be done within the instrumentation library, as most outgoing requests generally do not have an outgoing proxy to conditionally filter headers.

Dec 20 '23 23:12 robotadam

One additional thought; I'm not totally clear on how baggage works but if baggage follows the same rules as context then this sort of feature could prevent baggage from being unintentionally sent to a third party.

Dec 20 '23 23:12 robotadam

We've discussed this during the Jan. 10th, 2024 TC issue triage meeting and here are the suggestions:

Having instrumentation libraries providing configurations to stop Baggage propagation for certain systems/services makes sense. Whether this will be a spec level thing or it'll be individual instrumentation libraries decision - is open for discussion and need more discussions/feedback.
Stopping trace id / parent id / tracestate propagation does not seem to be a common use case, according to the W3C TraceContext specification, having these information shouldn't cause any systems to stop working or misbehave, so we think it should be propagated by default even if certain external systems do not send the trace to the same backend.

@robotadam would you create a separate issue for Baggage so we can discussion more? Thanks!

Jan 10 '24 15:01 reyang

edit: this is a hack and not recommended, see https://github.com/open-telemetry/opentelemetry-specification/issues/3799#issuecomment-1919867950

Hello, a pragmatic approach, that would effectively solve most use cases, is to allow adding a "pre hook", thats to say hook into the beginning, to the default W3C Baggage propagator's inject method. This allows developers to early return if some condition is met, best example being if the http.url is determined to be that of a third party.

Thanks to function binding in JS this is achievable, but a more user friendly experience, of passing in a function to the default w3c baggage propagtor constructor is desired.

Jan 30 '24 19:01 naseemkullah

@robotadam to be clear, trace context is explicitly designed to work across multiple tracing implementations managed by multiple third parties: https://www.w3.org/TR/trace-context/#tracestate-header

There are cases where you want to drop trace context, but generally this should be handled on the way in, not on the way out. APIs designed to be called by untrusted third parties should drop trace context if they are not planning on sharing trace data. Baggage on the other hand should be dropped on the way out. Either way, I agree a filtering config would be nice. But I don't think that it has to be in core, this can just be a contrib package. @naseemkullah's propagator wrapper would be a reasonable way to implement all of this.

btw, @robotadam as an aside: in your web hook example, I'm surprised you wouldn't want those traces linked! If there was a problem in the web hook, it seems to me that understanding what the hell was triggering it would be valuable information. Better to have some missing spans than a missing origin story. :)

Jan 30 '24 21:01 tedsuo

Hello, a pragmatic approach, that would effectively solve most use cases, is to allow adding a "pre hook", thats to say hook into the beginning, to the default W3C Baggage propagator's inject method. This allows developers to early return if some condition is met, best example being if the http.url is determined to be that of a third party.

Thanks to function binding in JS this is achievable, but a more user friendly experience, of passing in a function to the default w3c baggage propagtor constructor is desired.

I am the JS maintainer and I DO NOT recommend this. It relies on abusing SDK internals which are meant to be private and could change at any time (that is why the cast is required). Span attributes are not readable. See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#additional-span-interfaces

The API-level definition for Span's interface only defines write-only access to the span. This is good because instrumentations and applications are not meant to use the data stored in a span for application logic.

Jan 31 '24 20:01 dyladan

Sorry, I forgot to follow up on this. I've submitted #3957 for the baggage issue, and I'll close this one. Thanks, all.

Mar 22 '24 22:03 robotadam

opentelemetry-specification opentelemetry-specification copied to clipboard

Conditionally disable context in outbound instrumentation to stop it from reaching third parties

opentelemetry-specification
opentelemetry-specification copied to clipboard