Stabilization - OTLP
This is a ticket to capture stabilization efforts around the OTLP crate. It will likely be composed of a review of the current state and some architectural documentation thereof, some proposed changes, and then the implementation of those changes.
We'll cover off everything in the compliance matrix for exporters here too.
Open Issues & PRs related to OTLP as of 2025-07-10
Issues (latest updated first)
| # | Title (abridged) | Updated | Area |
|---|---|---|---|
| #3045 | Simplify opentelemetry-proto (SDK decoupling, gRPC separation) |
2025-06-30 | Architecture |
| #2881 | Allow async interceptors for OTLP gRPC exporter | 2025-06-17 | Extensibility |
| #2802 | OTLP MetricExporter deadlock |
2025-05-19 | Metrics / runtime |
| #1987 | Broken-pipe errors when exporting via OTLP | 2025-03-31 | Stability |
| #2877 | Prevent telemetry-induced telemetry in OTLP exporter | 2025-03-27 | Performance |
| #2777 | Handle shutdown in OTLP/gRPC | 2025-03-11 | Lifecycle |
| #2602 | OTLP File Exporter | 2025-02-27 | Feature request |
| #1592 | OTLP Exporter builder ergonomics | 2024-12-14 | API |
| #2364 | 2024-11-27 | OTLP Exporter Retry & Persistence |
PRs (latest updated first)
| # | Title | Updated | Purpose |
|---|---|---|---|
| PR #3046 | SDK decoupling & gRPC separation | 2025-07-06 | Follow-up to issue #3045 |
| PR #2812 | Add shutdown for spans | 2025-07-04 | Lifecycle correctness |
| PR #2727 | Add Retry to OTLP exporter | 2025-06-25 | Spec compliance (RetryInfo) |
| PR #3003 | Guard against enabling multiple HTTP features | 2025-05-27 | Build-time safety |
| PR #2465 | Configure TLS via env vars | 2025-04-09 | Ease of use |
| PR #2491 | Fix (un)serialize edge-cases in metrics | 2025-03-17 | Data model |
| PR #2524 | OTLP support matrix doc | 2025-03-06 | Documentation |
| PR #2758 | Implement FromStr/Display for Protocol |
2025-03-06 | API polish |
I believe we're misaligned with these bits of the spec so far. Links to code where the thing would be happening and relevant bit of the spec.
Spec Compliance
| Rank | Spec Requirement (section) | Current Status | Action | Importance | Code | Spec |
|---|---|---|---|---|---|---|
| 1 | MUST: Exporters respect RetryInfo / Retry-After (Retry) |
gRPC & HTTP ignore retry mechanisms. | Implement exponential back-off based on metadata / header. | High | http/trace.rs#L49 / tonic/trace.rs#L87 | https://opentelemetry.io/docs/specs/otel/protocol/exporter/#retry |
| 2 | MUST: Logs SeverityNumber 1-24 mapping |
Exporter doesn't validate; may emit invalid values (regular SDK mapping is correct, but direct OTLP construction could bypass validation). | No action - existing indirection over serialization models sufficient | High | logs.rs#L148 | https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-severitynumber |
| 3 | SHOULD: Support gzip and zstd compression | HTTP path lacks all compression support (both gzip and zstd). | Implement compression for HTTP. | Medium | http/mod.rs#L47 | https://opentelemetry.io/docs/specs/otel/protocol/exporter/#compression |
| 4 | SHOULD: Partial success body SHOULD be logged / propagated | Both HTTP and gRPC discard partial success information. | Bubble up info via ExporterStatus. |
Medium | tonic/trace.rs#L90 | https://opentelemetry.io/docs/specs/otlp/#partial-success |
Additional Observations
Concurrent Export Limits
The OTLP specification recommends concurrent requests for high throughput:
implementations that need to achieve high throughput SHOULD support concurrent operations [...] The number of concurrent requests SHOULD be configurable.
As it stands we support concurrent requests, but do not provide a mechanism to limit them. In other languages this can be configured via the default HTTP client (e.g., java with OkHttp). Dotnet also considered what to do here some time ago.
Protocol Enum Ambiguity
OpenTelemetry Protocol Requirements: The OTLP specification supports multiple transport protocols: gRPC, HTTP/protobuf, and HTTP/JSON.
We model this using a fluent builder style API - first to select the transport protocol (note the feature gates):
https://github.com/open-telemetry/opentelemetry-rust/blob/c5fde17e7d17af9f2987b07f9d3f80c03291b7cc/opentelemetry-otlp/src/span.rs#L60-L61
https://github.com/open-telemetry/opentelemetry-rust/blob/c5fde17e7d17af9f2987b07f9d3f80c03291b7cc/opentelemetry-otlp/src/span.rs#L68-L70
Then, to select the encoding (rather confusingly called "protocol"):
https://github.com/open-telemetry/opentelemetry-rust/blob/c5fde17e7d17af9f2987b07f9d3f80c03291b7cc/opentelemetry-otlp/src/exporter/mod.rs#L258
https://github.com/open-telemetry/opentelemetry-rust/blob/c5fde17e7d17af9f2987b07f9d3f80c03291b7cc/opentelemetry-otlp/src/lib.rs#L440-L450
This means we can select impossible combinations which are silently remapped at runtime:
HTTP Transport + gRPC Protocol: with_http().with_protocol(Protocol::Grpc) uses HTTP transport with protobuf instead of actual gRPC wire format, silently accepting semantically incorrect configuration.
exporter/http/mod.rs:303 - Protocol::Grpc falls through to _ wildcard and gets HTTP+protobuf instead of failing
gRPC Transport + HTTP Protocol: with_tonic().with_protocol(Protocol::HttpJson) ignores protocol setting entirely, always uses gRPC regardless of specified protocol
exporter/tonic/mod.rs:140 - TonicExporterBuilder hard-codes protocol: Protocol::Grpc regardless of user configuration
Feature-Gated Runtime Behavior: Protocol::HttpJson compiles successfully but silently falls back to protobuf when http-json feature is disabled
exporter/http/mod.rs:298-299 - #[cfg(feature = "http-json")] arm disappears at compile time without the feature
Graceful gRPC Shutdown (#2777)
The current shutdown implementation simply takes the ClientInner via self.inner.take() without explicit client connection cleanup, and Tonic lacks direct connection closing APIs for clients. This can leave gRPC client connections and resources improperly cleaned up in production systems where exporters are frequently created and destroyed.
Exporter doesn't validate; Logs SeverityNumber 1-24 mapping
I am not sure if Exporter should do anything to validate..Any reason why you think exporter must do this? (In my view, Exporters job is to serialize and transport the data only, not ensuring the validity...)
Summary is legacy and may not be required).
Summary is not required. There is no way to produce them in the SDK also.
Summary is not required. There is no way to produce them in the SDK also.
Removed!
Exporter doesn't validate; Logs SeverityNumber 1-24 mapping
I am not sure if Exporter should do anything to validate..Any reason why you think exporter must do this? (In my view, Exporters job is to serialize and transport the data only, not ensuring the validity...)
I don't think it's strictly necessary, but the "robustness principle" would encourage us to. Having said that I can't see an ergonomic way of doing it - we want the types to be pub so that folks can use them directly, which we see in other requests to the project, and I don't think we want code in the exporters looping over the records and mutating values.
I'm working through the points in the compliance matrix to reconfirm what else we need. There's a few things in here it seems like we now support. @cijothomas is it worth updating the matrix?
| Feature | Rust | Complete? | Notes |
|---|---|---|---|
Exporter interface has ForceFlush |
- | ❌ | OTLP exporters lack ForceFlush implementation - see trace exporter |
| OTLP/HTTP JSON Protobuf Exporter | ✅ | Protocol::HttpJson handling with http-json feature |
|
| OTLP/HTTP gzip Content-Encoding support | ❌ | Only gRPC compression via gzip-tonic feature, HTTP lacks compression | |
| Honors retryable responses with backoff | ❌ | No retry logic in HTTP trace exporter | |
| Honors non-retryable responses | ❌ | All errors treated generically in HTTP exporters | |
| Honors throttling response | ❌ | No 429 throttling handling in HTTP exporters | |
| Multi-destination spec compliance | ❌ | Single endpoint per exporter in builder pattern | |
| SchemaURL in ResourceSpans and ScopeSpans | ✅ | Resource and Scope schema_url mapping in trace transformations | |
| SchemaURL in ResourceMetrics and ScopeMetrics | ✅ | Resource and Scope schema_url mapping in metrics transformations | |
| SchemaURL in ResourceLogs and ScopeLogs | ✅ | Resource and Scope schema_url mapping in logs transformations | |
| Honors the user agent spec | ✅ | Default User-Agent header construction with proper format | |
| Partial Success messages are handled and logged for OTLP/gRPC | ❌ | Proto definitions exist but no handling in gRPC trace exporter | |
| Partial Success messages are handled and logged for OTLP/HTTP | ❌ | Response parsing missing in HTTP trace exporter | |
| Metric Exporter configurable temporality preference | ✅ | MetricExporterBuilder::with_temporality() implementation | |
| Metric Exporter configurable default aggregation | ❌ | Missing AggregationSelector trait and builder configuration |
I'm working through the points in the compliance matrix to reconfirm what else we need. There's a few things in here it seems like we now support. @cijothomas is it worth updating the matrix?
Yes, let's update the matrix to keep it accurate. I also suggest to focus only on OTLP and ignore Zipkin and Prometheus now.
No in-memory test exporter provided in OTLP crate
Its not part of OTLP crate, and not required to be either.
Concurrent Export Limits
lets not worry about it for now in the OTLP Exporters, as upstream sdk (BatchProcessor) only calls export sequentially. We need to first address that issue, and then come back to add concurrent export in OTLP. https://github.com/open-telemetry/opentelemetry-rust/pull/3028 adds concurrent_export for Spans, but only for the experimental feature, not the default Batchprocessor.
Metric Exporter configurable default aggregation
I'd treat this as very low priority, and move it outside scope for stable release. It is not something we have seen asks for. (Also using Views + experimental feature, this can be achieved on a per metric basis)
Blocking for stable
Dealing with responses and failures ...
| Feature | Brief description (spec vs. current gap) | Issue / PR | Notes |
|---|---|---|---|
| Retry & Throttling | Exporter MUST honor RetryInfo / Retry‑After for both generic retryable errors and throttling signals (HTTP 429 / gRPC resource_exhausted), applying exponential back‑off. gRPC & HTTP paths ignore this today. |
#3126 | Done |
| Partial success | Exporter SHOULD surface and handle OTLP “partial success” codes & messages instead of silently discarding them. | #3206 | Done |
API cleanliness:
| Feature | Brief description (spec vs. current gap) | Issue / PR | Notes |
|---|---|---|---|
| Protocol enum ambiguity | Builder lets users pick impossible combos (e.g. HTTP + Protocol::Grpc) and silently remaps them. |
#2758 | PR polishes enum conversions; still missing upfront validation/error. |
Misc:
| Feature | Brief description (spec vs. current gap) | Issue / PR | Notes |
|---|---|---|---|
| gzip & zstd compression | HTTP exporter should offer both gzip and zstd; only gRPC supports gzip today. |
#3083 | Done |
| Telemetry‑induced telemetry guard | Exporter should avoid emitting telemetry for its own network calls to prevent feedback loops. | #2877 | Issue tracks guard logic; cross‑thread libs (hyper, tonic) still leak context. @cijothomas looking at this currently. |
Misc / non-blocking
| Feature | Brief description (spec vs. current gap) | Issue / PR | Notes |
|---|---|---|---|
| ForceFlush | Exporters should implement force_flush() per SDK spec; currently missing for all signals. |
– | Needs unified async/timeout design. |
| Multi‑destination endpoints | Spec allows exporters to target multiple OTLP endpoints; Rust builder supports only one. | – | Requires vector of endpoint configs plus round‑robin / fail‑over handling. |
@cijothomas I think that list at the bottom captures everything we've discussed being blocking. LMK if i've missed anything, otherwise it's time to start working on it!
#3110 also related
Removed #2364 as it is not specced and not blocking for OTLP stabilization.
We've done everything we identified as blocking based on the earlier walk through of the spec against the implementation. The big things from the spec gap analysis - retries, partial success, compression - are done. What are we missing?
Potentially related outstanding OTLP issues:
Maybe blocking
- #3045
- #2994 - should be a quick fix; i'll pick this up now
- #2877 - not sure of the status of this - @cijothomas / @bantonsson I know some work has gone into it ?
- #2802 - this is a bit nerve wracking
- #2262 - this looks almost done. i'll pick it up and close it.
Not Blocking
- #3123 - I don't think this is likely to force a public API change, so I don't think it should block
- #2881 - this is an enhancement request and goes beyond the spec
- #2576
@cijothomas @lalitb what of this is blocking to call OTLP stable ? What else are we missing?
Hi @scottgerring. Is there anything I can help contribute in? I'm new to the OTLP scene, so might need some guidance. But happy to help put in my time if I can get some mentoring
Hey @rakshith-ravi - i'm hoping everything blocking stability for OTLP is done but i'll ping you if it's not - thanks for asking! in the meantime you might wanna check out the good first issue issues :)