Trace exporter errors but metric exporter succeeds
Summary
The OTLPMetricExporter reaches the Collector but the OTLPSpanExporter fails with this error (formatted for readability):
E1130 15:37:07.081350000 6174420992 ssl_transport_security.cc:1511]
Handshake failed with fatal error SSL_ERROR_SSL:
error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
Unfortunately, the error is opaque and does not include remediation instructions.
Environment
Python 3.11.6 on macOS Sonoma 14.1.1
Steps to reproduce
Run a demo app exporting metrics and traces to a collector.
Instrumentation setup
OTLP_ENDPOINT=os.getenv("OTLP_ENDPOINT")
# Resource i.e. whoami
resource = Resource(attributes={
SERVICE_NAME: "demo"
})
# Traces
tprovider = TracerProvider(resource=resource)
otlp_processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=OTLP_ENDPOINT))
tprovider.add_span_processor(otlp_processor)
trace.set_tracer_provider(tprovider)
tracer = trace.get_tracer("demo-tracer")
# Metrics
otlp_metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter(endpoint=OTLP_ENDPOINT))
mprovider = MeterProvider(resource=resource, metric_readers=[otlp_metric_reader])
metrics.set_meter_provider(mprovider)
meter = metrics.get_meter("demo-meter")
Instrumentation definition
latency_histogram = meter.create_histogram(
"response_time",
unit="ms",
description="Time spent in a request",
)
call_count = meter.create_counter(
"call_count",
unit="1",
description="Number of requests",
)
def instrument(func):
"""This decorator instruments a Flask route to create a span for the request.
The span tracks the request's method, scheme, host, path, etc.
"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
req = flask.request
import time
_t0 = time.time_ns()
with tracer.start_as_current_span(func.__name__) as span:
span.set_attribute(SpanAttributes.HTTP_METHOD, req.method)
span.set_attribute(SpanAttributes.HTTP_SCHEME, req.scheme)
span.set_attribute(SpanAttributes.HTTP_HOST, req.host)
span.set_attribute(SpanAttributes.HTTP_TARGET, req.path)
span.set_attribute(SpanAttributes.HTTP_USER_AGENT, req.user_agent.string)
span.set_attribute(SpanAttributes.HTTP_CLIENT_IP, req.remote_addr)
span.set_attribute(SpanAttributes.HTTP_FLAVOR, req.environ.get('SERVER_PROTOCOL'))
span.set_attribute(SpanAttributes.NET_HOST_PORT, req.environ.get('SERVER_PORT'))
span.set_attribute(SpanAttributes.NET_HOST_NAME, req.environ.get('SERVER_NAME'))
try:
result = func(*args, **kwargs)
except Exception as exc:
span.set_status(trace.Status(trace.StatusCode.ERROR))
span.record_exception(exc)
raise
else:
span.set_status(trace.Status(trace.StatusCode.OK))
_t1 = time.time_ns()
_req_dur = (_t1 - _t0) / 1e6
request_duration = (span.end_time - span.start_time) / 1e6
assert request_duration / _req_dur > 0.1, "request duration is not recorded in nanoseconds"
latency_histogram.record(request_duration)
call_count.add(1)
return result
return wrapper
Make a Flask app
app = Flask(__name__)
@app.route('/')
@instrument
def index():
sec = 0.225 + random.random() * 0.03
time.sleep(sec)
print(f"slept {sec * 1000.} ms")
return "OK"
Run the app
In one terminal:
OTLP_ENDPOINT=127.0.0.1:4317 flask run --port 8080
Send requests
In another terminal:
while true
curl localhost:8080
sleep 1
end
Run the collector
In a third terminal, from the docs:
docker pull otel/opentelemetry-collector:0.90.0
docker run -p 127.0.0.1:4317:4317 -p 127.0.0.1:55679:55679 otel/opentelemetry-collector:0.90.0
Note the 4317 endpoint exposed.
Expected behavior
- Spans arrive in the collector
- Metrics arrive in the collector
Actual behavior
The metrics appear to arrive successfully, but the spans fail with an openssl error.
Span exporter
I observe this error in the Demo App console logs:
E1130 15:37:07.081350000 6174420992 ssl_transport_security.cc:1511] Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
WARNING:opentelemetry.exporter.otlp.proto.grpc.exporter:Transient error StatusCode.UNAVAILABLE encountered while exporting traces to 127.0.0.1:4317, retrying in 1s.
Metric exporter
While it doesn't include the description, units, or the other metadata included in the metric definition, the latency histogram does appear in the console logs for the Collector. This indicates the OTLP protocol successfully exports metrics from the Demo App to the Collector. For example, here is a snippet of the Collector console logs:
Metric #9
Descriptor:
-> Name: otelcol_processor_batch_batch_send_size
-> Description: Number of units in the batch
-> Unit:
-> DataType: Histogram
-> AggregationTemporality: Cumulative
HistogramDataPoints #0
Data point attributes:
-> processor: Str(batch)
-> service_instance_id: Str(2d791575-cdb2-48b1-8916-8b01696690d0)
-> service_name: Str(otelcol)
-> service_version: Str(0.90.0)
StartTimestamp: 2023-11-30 20:21:51.783 +0000 UTC
Timestamp: 2023-11-30 20:48:31.787 +0000 UTC
Count: 161
Sum: 2571.000000
ExplicitBounds #0: 10.000000
ExplicitBounds #1: 25.000000
ExplicitBounds #2: 50.000000
...
Buckets #0, Count: 0
Buckets #1, Count: 161
...
Additional context
Something extra to note is that, though the metrics arrive, they appear to indicate that the requests are all taking between 25 and 50 ms each. However, I log the request time independently and note that the request times for this Demo App are all 240 ms +/- 15 ms.
I also experienced this, this means that you are trying to connect the endpoint with tls protocol
try setting
--exporter_otlp_insecure true
if you're experiencing this.