fix: Explore how to achieve telemetry suppression with OTLP
One way of addressing https://github.com/open-telemetry/opentelemetry-rust/issues/2877
This PR does not introduce a “fix” inside the OTLP Exporters themselves, but instead demonstrates how users can address the issue without requiring changes in OpenTelemetry.
Background
OpenTelemetry provides a mechanism to suppress telemetry based on the current Context. However, this suppression only works if every component involved properly propagates OpenTelemetry’s Context. Libraries like tonic and hyper are not aware of OTel’s Context and therefore do not propagate it across threads.
As a result, OTel’s suppression can fail, leading to telemetry-induced-telemetry—where the act of exporting telemetry (e.g., sending data via tonic/hyper) itself generates additional telemetry. This newly generated telemetry is then exported again, triggering yet more telemetry in a loop, potentially overwhelming the system.
What this PR does
OTLP/gRPC exporters rely on the tonic client, which captures the current runtime at creation time and uses it to drive futures. Instead of reusing the application’s existing runtime, this PR creates a dedicated Tokio runtime exclusively for the OTLP Exporter.
In this dedicated runtime: 1. We intercept on_start / on_stop events. 2. Sets OTel’s suppression flag in the context.
This ensures that telemetry generated by libraries such as hyper/tonic will be suppressed only within the exporter’s dedicated runtime. If those same libraries are used elsewhere for application logic, they continue to function normally and emit telemetry as expected.
Depending on the feedback, we could either address this purely through documentation and examples, or we could enhance the OTLP Exporter itself to expose a feature flag that, when enabled, would automatically create the tonic client within its own dedicated runtime.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 80.1%. Comparing base (e9ca158) to head (54e05d5).
Additional details and impacted files
@@ Coverage Diff @@
## main #3084 +/- ##
=====================================
Coverage 80.1% 80.1%
=====================================
Files 126 126
Lines 21957 21957
=====================================
Hits 17603 17603
Misses 4354 4354
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?
I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.
My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?
It feels like our first suggestion to users should be:
If you're not using tonic/hyper as a HTTP client in your app, simply tune your tracing subscriber to suppress their telemetry
Then this is is only necessary for a subset of users
Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like
withTelemetrySuppression(_ => { /* setup otel here */ } )?I expect this would still require the user to not use a
tokio_mainbut rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?
Good point. This is already the case even without this PR! See https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/src/lib.rs#L112-L113 OTLP/gRPC Exporter already requires a tokio runtime - either it captures current one if user has tokio::main, or we explicitly ask users to create an RT and do OTLP instantiation inside it.
Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.)
At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual Context field for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.
My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?
Quite valid point! It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context. But if user needs that capability, then asking them to create another runtime will strain resources, but not too much - it's just one thread, which is sitting idle 99% of time. We do have such concerns already with our BatchProcessor/PeriodicReader - they all by default creates a separate thread instead of plugging into user's existing runtime, though users can avoid it by opting into currently experimental features.
Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.) ... At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual
Contextfield for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.
I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!
It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context
Good point - regular filtering is the "default" and this is an opt-in thing for folks who want to selectively keep some http client logging.
I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!
A more universally agreed concept of Context would be nice, but it'll require lot of work to drive something like that.
Another alternative is for the clients we use (tonic/hyper etc) to expose a way to opt-out to their usual logging, and then OTLP opting out this way. That'd also require some efforts to drive this across the clients we use!
For what it's worth, I have been able to use this approach downstream in logfire to successfully suppress all export telemetry.
- To avoid
reqwestspawning a background thread outside of my control, I had to switch to use thereqwest-client(async client) inopentelemetry-otlp. - Due to that client needing a tokio runtime, I decided to just spawn a background
tokioruntime inside the logfire SDK for the exporters. Using the approach here I suppress all telemetry on that runtime's threads. - ... and similarly I needed to use the experimental
asyncbatch exporters, because the async reqwest client doesn't work in the background thread of the syncBatchSpanExporter(etc) because those threads don't have a tokio context.
https://github.com/pydantic/logfire-rust/pull/95