Control baggage propagation in automatic instrumentation libraries
This is a followup to #3799 ; in that thread it was decided that context should always be propagated, but baggage is potentially a problem.
What are you trying to achieve?
Given an application that uses an instrumentation for an HTTP or RPC client library (e.g. for the Python requests library), any baggage set by the primary application will also be sent to a third party, even if it is only intended for internal use. This has the potential to leak internal private data. Opentelemetry may want to adopt a pattern that allows baggage to only be sent for some urls, similar to the OTEL_PYTHON_REQUESTS_EXCLUDED_URLS environment variable.
Additional context.
That this happens was added to the documentation in https://github.com/open-telemetry/opentelemetry.io/pull/3530 and there was also discussion in https://github.com/open-telemetry/opentelemetry-specification/issues/1633 .
In the system I'm working on we wouldn't be able both baggage and instrumentation libraries at the same time, because we need the client spans to properly debug and trace outgoing API calls, but we cannot allow baggage to be sent to these external APIs.
@jsuereth this seems related to what we talked about in the past that the propagation API is not sufficient (believe lambda propagation was the context of that conversation). There are definitely some cases in which the instrumentation has knowledge required in order to make effective propagation decisions.
I have added this to the Tuesday, April 9 Specification SIG agenda.
We should see if we can accomplish this purely through the W3C Baggage Propagator. The issue with having this be an instrumentation level configuration is that it will be difficult / impossible to enforce standard configuration for instrumentation which lives outside the OpenTelemetry org, which is something we expect more of as time goes by.
In theory, the W3C baggage propagator should be able to access the current client span from the context, and enable / disable propagation based on the contents, including the URL.
Is this not an issue for all propagators (not only W3C Baggage)?
During the Spec call of April 9th there was initial agreement on the need to support this, hence marking this as valid.
Would https://github.com/open-telemetry/opentelemetry-specification/issues/1633 be a more general issue to discuss this as something needed to be supported on the Propagators API to stop all context being propagated, not only baggage?
It looks like #1633 is now moving towards an instrumentation solution for both baggage and context, similar to my original post in 3799.
If there is consensus in the upcoming OTEP we can close this.
Opt-out/blacklisting mechanisms like OTEL_PYTHON_REQUESTS_EXCLUDED_URLS, OTEL_PYTHON_EXCLUDED_URLS, and OTEL_PYTHON_AIOHTTP_CLIENT_EXCLUDED_URLS - they are a blacklisting (i.e. opt-out) are INHERENTLY INSECURE .
**This is because it's easily error-prone. For example - due to the cross-cutting nature of opentelemetry instrumentation - it's very easy to miss some new functionality that makes REST API calls to a 3rd party being added in a larger application where a developer is unaware of the need to exclude certain URLs from propagation. Or there might even be some minor misspelling in the opt-out. As a result, trace and baggage data may flow to those services for a long time before being discovered, if ever!
See https://github.com/open-telemetry/opentelemetry-python-contrib/issues/3906 for more details.
Another feedback regarding the OTEL_PYTHON_EXCLUDED_URLS and similar mechanisms - looking at aiohttp implementation of exclude_urls as an example - https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-aiohttp-client/src/opentelemetry/instrumentation/aiohttp_client/init.py
If a URL is excluded, then it appears that no only will the trace/baggage not be propagated to downstream remote, but also no span will be created around the REST API call at all - meaning that we won't even be able to observe span in our own Tempo db or similar. If I understood this correctly, this seems very wrong on so many different levels!
What is absolutely needed is a configurable ability to reliably prevent propagation of trace and/or bagge to the remote, but still be able to monitor the span in our own systems (see how long the request took to execute, exception/error info, etc.) without having to hack around this.
Finally, consider a use case where only baggage contains sensitive information, so we may need to block baggage propagation but still allow trace propagation to certain targets of end-to-end cross-system debugging.