sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

No trace_id set in error reports when using OpenTelemetry

Open ripperdoc opened this issue 1 year ago • 4 comments

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.6.0

Steps to Reproduce

I run a FastAPI server that uses libraries that are isntrumented with OpenTelemetry, so I've setup Sentry to use otel. That works fine in terms of reporting the performance metrics using OpenTelemetry's FastAPI integration but my error reports are not getting any trace_id propagation when looking in before_send and in the Web UI. I send test requests to it from a Sentry-instrumented frontend app.

Here is a minimal reproducible sample code:

import os

import sentry_sdk
import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.sdk.trace import TracerProvider
from sentry_sdk.integrations.opentelemetry.propagator import SentryPropagator
from sentry_sdk.integrations.opentelemetry.span_processor import SentrySpanProcessor

dsn = os.getenv("SENTRY_DSN")


def check_event(event, hint):
    print(event)
    return event


sentry_sdk.init(
    dsn=dsn,
    sample_rate=1.0,
    enable_tracing=True,
    traces_sample_rate=1.0,
    environment="local-dev",
    instrumenter="otel",
    before_send=check_event,
)
provider = TracerProvider()
provider.add_span_processor(SentrySpanProcessor())
trace.set_tracer_provider(provider)
set_global_textmap(SentryPropagator())

app = FastAPI(docs_url=None)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # TODO limit
    allow_headers=["*"],
    allow_methods=["GET", "POST"],
)


@app.post("/sessions")
async def create_session(request: Request):
    current_span = trace.get_current_span()
    transaction = sentry_sdk.Hub.current.scope.transaction

    headers = dict(request.headers)
    print(headers.get("sentry-trace"))

    division_by_zero = 1 / 0


FastAPIInstrumentor.instrument_app(app)


if __name__ == "__main__":
    # Port 7000 in prod, 7001 in dev to not collide with others
    uvicorn.run(app, host="0.0.0.0", port=7001)

Expected Result

That trace_id is shown in the Sentry Error report.

Actual Result

I can see that the headers are received with sentry-trace and baggage, and they are picked up by Sentry. The correct trace_id is shown when inspecting the parent span for current_span given by otel in above code. But transaction is empty (not sure if this is supposed to work).

But when I cause the error, the reported error lacks the trace_id and links to any traces. However, it has picked up some of the transaction information from otel, it seems. See:

image

When checking the Performance metrics in Sentry, I can lookup that transaction to POST /sessions and find the original trace_id. I can even find the Division by Zero error listed at the bottom of the transaction page, and when I click it I get to the event in above screenshot, but there it again says no trace ID.

ripperdoc avatar Jul 01 '24 12:07 ripperdoc

Can add another clue: if I change the line division_by_zero to instead say logger.error("This is an error message") the trace_id is shown in the resulting issue at Sentry web UI. So there is a difference between different ways the error is created.

ripperdoc avatar Jul 01 '24 13:07 ripperdoc

Hi @ripperdoc – thanks for the issue report!

This does indeed appear to be a bug. The cause essentially the ordering of the OTel and Sentry instrumentations. Right now, the Sentry error instrumentation is essentially one layer outside of the OTel middleware. So, by the time the Sentry error instrumentation has caught the error, the OTel span has ended and the trace_id is therefore lost with the span. You can visualize the current process like so:

image

The logging.error has a trace ID because the Sentry logging instrumentation gets called while the OTel span is still running, so we are able to obtain the trace_id and set it on the event.


The solution, at a very high level, will essentially be to reorder the Sentry and OTel instrumentations to look as follows:

image

szokeasaurusrex avatar Jul 01 '24 15:07 szokeasaurusrex

Thanks @szokeasaurusrex for quick response! Just to clarify, is there any action I can take at this point or will this need to be solved in Sentry internals?

ripperdoc avatar Jul 01 '24 18:07 ripperdoc

@ripperdoc Unfortunately this is likely this is something that needs to be resolved in the Sentry SDK itself – I cannot think of any simple way to work around this problem

szokeasaurusrex avatar Jul 02 '24 08:07 szokeasaurusrex

@szokeasaurusrex according to your schema, can reordering of instrumentation calls solve the issue? Or it will harm other core functionality? In both Sentry and Otel docs it said "call as early as possible". For example do first FastAPIInstrumentor.instrument_app(app) and then Sentry.init()

mark-x-omgene avatar Jan 02 '25 10:01 mark-x-omgene

@mark-x-omgene I guess the best way to find out would be to try; I don't remember whether I tried reordering the calls when initially looking into this issue. Reordering could have other unintended consequences, but it might work.

Please report back if reordering the calls solves the problem for you.

szokeasaurusrex avatar Jan 03 '25 08:01 szokeasaurusrex

Folks, since there's a proposed solution, I'll close this. Please reopen if the workaround is not viable.

sentrivana avatar Mar 12 '25 15:03 sentrivana

I too am encountering this issue, and sadly the order of initialization does not make a difference. After doing a bit of rooting around it looks to be due to an issue with how OpenTelemetry instrumentation interacts with the FastAPI exception handler. When an exception is handled the trace process seems to be prematurely cut off, resulting in incomplete data and missing child spans, which lines up with the scenario presented here and my own experience.

There is a PR currently open on the OpenTelemetry Python contrib repo to hopefully address this issue. While I have not yet been able to confirm if it does in my environment, this comment affirms that it does.

henworth avatar Apr 14 '25 18:04 henworth