No trace_id set in error reports when using OpenTelemetry
How do you use Sentry?
Sentry Saas (sentry.io)
Version
2.6.0
Steps to Reproduce
I run a FastAPI server that uses libraries that are isntrumented with OpenTelemetry, so I've setup Sentry to use otel. That works fine in terms of reporting the performance metrics using OpenTelemetry's FastAPI integration but my error reports are not getting any trace_id propagation when looking in before_send and in the Web UI. I send test requests to it from a Sentry-instrumented frontend app.
Here is a minimal reproducible sample code:
import os
import sentry_sdk
import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.sdk.trace import TracerProvider
from sentry_sdk.integrations.opentelemetry.propagator import SentryPropagator
from sentry_sdk.integrations.opentelemetry.span_processor import SentrySpanProcessor
dsn = os.getenv("SENTRY_DSN")
def check_event(event, hint):
print(event)
return event
sentry_sdk.init(
dsn=dsn,
sample_rate=1.0,
enable_tracing=True,
traces_sample_rate=1.0,
environment="local-dev",
instrumenter="otel",
before_send=check_event,
)
provider = TracerProvider()
provider.add_span_processor(SentrySpanProcessor())
trace.set_tracer_provider(provider)
set_global_textmap(SentryPropagator())
app = FastAPI(docs_url=None)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # TODO limit
allow_headers=["*"],
allow_methods=["GET", "POST"],
)
@app.post("/sessions")
async def create_session(request: Request):
current_span = trace.get_current_span()
transaction = sentry_sdk.Hub.current.scope.transaction
headers = dict(request.headers)
print(headers.get("sentry-trace"))
division_by_zero = 1 / 0
FastAPIInstrumentor.instrument_app(app)
if __name__ == "__main__":
# Port 7000 in prod, 7001 in dev to not collide with others
uvicorn.run(app, host="0.0.0.0", port=7001)
Expected Result
That trace_id is shown in the Sentry Error report.
Actual Result
I can see that the headers are received with sentry-trace and baggage, and they are picked up by Sentry. The correct trace_id is shown when inspecting the parent span for current_span given by otel in above code. But transaction is empty (not sure if this is supposed to work).
But when I cause the error, the reported error lacks the trace_id and links to any traces. However, it has picked up some of the transaction information from otel, it seems. See:
When checking the Performance metrics in Sentry, I can lookup that transaction to POST /sessions and find the original trace_id. I can even find the Division by Zero error listed at the bottom of the transaction page, and when I click it I get to the event in above screenshot, but there it again says no trace ID.
Can add another clue: if I change the line division_by_zero to instead say logger.error("This is an error message") the trace_id is shown in the resulting issue at Sentry web UI. So there is a difference between different ways the error is created.
Hi @ripperdoc – thanks for the issue report!
This does indeed appear to be a bug. The cause essentially the ordering of the OTel and Sentry instrumentations. Right now, the Sentry error instrumentation is essentially one layer outside of the OTel middleware. So, by the time the Sentry error instrumentation has caught the error, the OTel span has ended and the trace_id is therefore lost with the span. You can visualize the current process like so:
The logging.error has a trace ID because the Sentry logging instrumentation gets called while the OTel span is still running, so we are able to obtain the trace_id and set it on the event.
The solution, at a very high level, will essentially be to reorder the Sentry and OTel instrumentations to look as follows:
Thanks @szokeasaurusrex for quick response! Just to clarify, is there any action I can take at this point or will this need to be solved in Sentry internals?
@ripperdoc Unfortunately this is likely this is something that needs to be resolved in the Sentry SDK itself – I cannot think of any simple way to work around this problem
@szokeasaurusrex according to your schema, can reordering of instrumentation calls solve the issue? Or it will harm other core functionality? In both Sentry and Otel docs it said "call as early as possible".
For example do first FastAPIInstrumentor.instrument_app(app) and then Sentry.init()
@mark-x-omgene I guess the best way to find out would be to try; I don't remember whether I tried reordering the calls when initially looking into this issue. Reordering could have other unintended consequences, but it might work.
Please report back if reordering the calls solves the problem for you.
Folks, since there's a proposed solution, I'll close this. Please reopen if the workaround is not viable.
I too am encountering this issue, and sadly the order of initialization does not make a difference. After doing a bit of rooting around it looks to be due to an issue with how OpenTelemetry instrumentation interacts with the FastAPI exception handler. When an exception is handled the trace process seems to be prematurely cut off, resulting in incomplete data and missing child spans, which lines up with the scenario presented here and my own experience.
There is a PR currently open on the OpenTelemetry Python contrib repo to hopefully address this issue. While I have not yet been able to confirm if it does in my environment, this comment affirms that it does.