opentelemetry-rust icon indicating copy to clipboard operation
opentelemetry-rust copied to clipboard

fix: Explore how to achieve telemetry suppression with OTLP

Open cijothomas opened this issue 5 months ago • 8 comments

One way of addressing https://github.com/open-telemetry/opentelemetry-rust/issues/2877

This PR does not introduce a “fix” inside the OTLP Exporters themselves, but instead demonstrates how users can address the issue without requiring changes in OpenTelemetry.

Background

OpenTelemetry provides a mechanism to suppress telemetry based on the current Context. However, this suppression only works if every component involved properly propagates OpenTelemetry’s Context. Libraries like tonic and hyper are not aware of OTel’s Context and therefore do not propagate it across threads.

As a result, OTel’s suppression can fail, leading to telemetry-induced-telemetry—where the act of exporting telemetry (e.g., sending data via tonic/hyper) itself generates additional telemetry. This newly generated telemetry is then exported again, triggering yet more telemetry in a loop, potentially overwhelming the system.

What this PR does

OTLP/gRPC exporters rely on the tonic client, which captures the current runtime at creation time and uses it to drive futures. Instead of reusing the application’s existing runtime, this PR creates a dedicated Tokio runtime exclusively for the OTLP Exporter.

In this dedicated runtime: 1. We intercept on_start / on_stop events. 2. Sets OTel’s suppression flag in the context.

This ensures that telemetry generated by libraries such as hyper/tonic will be suppressed only within the exporter’s dedicated runtime. If those same libraries are used elsewhere for application logic, they continue to function normally and emit telemetry as expected.

Depending on the feedback, we could either address this purely through documentation and examples, or we could enhance the OTLP Exporter itself to expose a feature flag that, when enabled, would automatically create the tonic client within its own dedicated runtime.

cijothomas avatar Jul 25 '25 16:07 cijothomas

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 80.1%. Comparing base (e9ca158) to head (54e05d5).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3084   +/-   ##
=====================================
  Coverage   80.1%   80.1%           
=====================================
  Files        126     126           
  Lines      21957   21957           
=====================================
  Hits       17603   17603           
  Misses      4354    4354           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jul 25 '25 16:07 codecov[bot]

Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?

I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

scottgerring avatar Jul 29 '25 11:07 scottgerring

It feels like our first suggestion to users should be:

If you're not using tonic/hyper as a HTTP client in your app, simply tune your tracing subscriber to suppress their telemetry

Then this is is only necessary for a subset of users

scottgerring avatar Jul 29 '25 11:07 scottgerring

Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?

I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

Good point. This is already the case even without this PR! See https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/src/lib.rs#L112-L113 OTLP/gRPC Exporter already requires a tokio runtime - either it captures current one if user has tokio::main, or we explicitly ask users to create an RT and do OTLP instantiation inside it.

Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.)

At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual Context field for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.

cijothomas avatar Aug 02 '25 02:08 cijothomas

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

Quite valid point! It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context. But if user needs that capability, then asking them to create another runtime will strain resources, but not too much - it's just one thread, which is sitting idle 99% of time. We do have such concerns already with our BatchProcessor/PeriodicReader - they all by default creates a separate thread instead of plugging into user's existing runtime, though users can avoid it by opting into currently experimental features.

cijothomas avatar Aug 02 '25 02:08 cijothomas

Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.) ... At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual Context field for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.

I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!

It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context

Good point - regular filtering is the "default" and this is an opt-in thing for folks who want to selectively keep some http client logging.

scottgerring avatar Aug 13 '25 06:08 scottgerring

I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!

A more universally agreed concept of Context would be nice, but it'll require lot of work to drive something like that. Another alternative is for the clients we use (tonic/hyper etc) to expose a way to opt-out to their usual logging, and then OTLP opting out this way. That'd also require some efforts to drive this across the clients we use!

cijothomas avatar Aug 13 '25 15:08 cijothomas

For what it's worth, I have been able to use this approach downstream in logfire to successfully suppress all export telemetry.

  • To avoid reqwest spawning a background thread outside of my control, I had to switch to use the reqwest-client (async client) in opentelemetry-otlp.
  • Due to that client needing a tokio runtime, I decided to just spawn a background tokio runtime inside the logfire SDK for the exporters. Using the approach here I suppress all telemetry on that runtime's threads.
  • ... and similarly I needed to use the experimental async batch exporters, because the async reqwest client doesn't work in the background thread of the sync BatchSpanExporter (etc) because those threads don't have a tokio context.

https://github.com/pydantic/logfire-rust/pull/95

davidhewitt avatar Aug 20 '25 16:08 davidhewitt