Improve error reporting / debugging UX with the OTLP default/HTTP exporters
addresses: https://github.com/open-telemetry/opentelemetry-ruby/issues/1931
This PR significantly enhances the debugging experience for OTLP exporters by:
- Adding rich context to export failure results
- Introducing comprehensive debug-level logging throughout the export pipeline
- Maintaining full backwards compatibility with existing exporter implementations
These changes ended up helping me debug a really gnarly issue where a slightly old version of the sentry-ruby SDK was causing issues with how the OpenTelemetry ruby SDK was bubbling up errors due to incorrect IPv6 parsing - causing all my traces to be dropped with an one-line error Unable to export X spans.
Reviewer's Note
Significant AI assistance was used in the process of getting this PR working.
Motivation
Previously, when OTLP exports failed, developers had minimal information to diagnose the root cause. The exporters simply returned a FAILURE constant without any context about:
- What type of error occurred
- HTTP response codes and messages
- Response bodies from the collector
- Retry attempts and their outcomes
- Exception details
This made troubleshooting production issues extremely difficult, especially for:
- Network connectivity problems
- SSL/TLS certificate issues
- Collector endpoint configuration errors
- HTTP timeout scenarios
- Server-side errors (4xx/5xx responses)
Changes
1. Enhanced Export Result Type (sdk/lib/opentelemetry/sdk/trace/export.rb)
Introduced a new ExportResult class that wraps result codes with optional error context:
class ExportResult
attr_reader :code, :error, :message
# Factory methods
def self.success
def self.failure(error: nil, message: nil)
def self.timeout
end
Backwards Compatibility: The ExportResult class overloads the == operator and provides to_i to ensure existing code comparing results to SUCCESS, FAILURE, or TIMEOUT constants continues to work seamlessly.
2. Comprehensive Debug Logging
Added detailed debug-level logging at key points in the export pipeline:
Entry/Exit Points
- Function entry with parameters (span count, timeout values)
- Function exit with return values
- Byte sizes (compressed vs uncompressed)
HTTP Request Flow
- Request preparation and compression
- Timeout calculations and retry counts
- HTTP response codes and messages
- Response bodies for error cases
Exception Handling
- Exception type and message for all caught exceptions
- Retry attempt tracking
- Max retry exceeded scenarios
3. Rich Failure Context
All failure scenarios now return detailed context via Export.failure():
HTTP Error Responses
OpenTelemetry::SDK::Trace::Export.failure(
message: "export failed with HTTP #{response.code} (#{response.message}) after #{retry_count} retries: #{body}"
)
Network Exceptions
OpenTelemetry::SDK::Trace::Export.failure(
error: e,
message: "export failed due to SocketError after #{retry_count} retries: #{e.message}"
)
Timeout Scenarios
OpenTelemetry::SDK::Trace::Export.failure(
message: 'timeout exceeded before sending request'
)
4. Enhanced BatchSpanProcessor Error Reporting
Updated BatchSpanProcessor to extract and log error context:
def report_result(result_code, span_array, error: nil, message: nil)
if result_code == SUCCESS
# ... metrics ...
else
error_message = if error
"BatchSpanProcessor: export failed due to #{error.class}: #{error.message}"
elsif message
"BatchSpanProcessor: export failed: #{message}"
else
"BatchSpanProcessor: export failed (no error details available) \n Call stack: #{caller.join("\n")}"
end
OpenTelemetry.handle_error(exception: ExportError.new(span_array), message: error_message)
end
end
5. Updated Exporters
Applied consistent changes to both:
-
OTLP default Exporter (
exporter/otlp/lib/opentelemetry/exporter/otlp/exporter.rb) -
OTLP HTTP Exporter (
exporter/otlp-http/lib/opentelemetry/exporter/otlp/http/trace_exporter.rb)
Both now capture exception objects and maintain the error context through the entire export pipeline.
Example Scenarios
Before
ERROR -- : OpenTelemetry error: Unable to export 10 spans
After (with debug logging enabled)
DEBUG -- : OTLP::Exporter#export: Called with 10 spans, timeout=30.0
DEBUG -- : OTLP::Exporter#export: Calling encode for 10 spans
DEBUG -- : OTLP::Exporter#send_bytes: Sending HTTP request
DEBUG -- : OTLP::Exporter#send_bytes: Caught SocketError: Connection refused, retry_count=1
DEBUG -- : OTLP::Exporter#send_bytes: Max retries exceeded for SocketError
ERROR -- : BatchSpanProcessor: export failed due to SocketError: Connection refused - connect(2) for "localhost" port 4318
ERROR -- : OpenTelemetry error: Unable to export 10 spans
Random passerby here ~ just want to say thank you @chen-anders! I am knee-deep debugging errors between my ruby app and my OTLP collector and the improvements in this PR would vastly help my efforts.
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot