python-sdk icon indicating copy to clipboard operation
python-sdk copied to clipboard

Adding Opentelemetry to MCP SDK

Open fali007 opened this issue 8 months ago • 10 comments

Is your feature request related to a problem? Please describe.

I would like to see Opentelemetry traces and Metrics baked into SDK. E.g.

  1. Traces emitted on all Requests from client and server side of MCP SDK.
  2. Metrics for tool/prompt/resource calls

Describe the solution you'd like

I am trying to use Opentelemetry sdk to add traces and metrics features.

Steps

  1. Creating session in client and server.
  2. Initialise tracer (Opentelemetry tracer instances) in BaseSession (src/mcp/shared/session.py).
  3. Binding trace context (traceparent) to _meta of RequestParams.Meta before sending request from client (send_request in src/mcp/shared/session.py)
  4. In server use the traceparent from received request to create new span with incoming span as parent span.

Describe alternatives you've considered

I tried to implement the same via traceloop. I was not able to get correct parent - child relationship between spans while doing so.

Additional context

I am able to achieve distributed tracing across my Agentic application and MCP server.

Image

fali007 avatar Apr 03 '25 12:04 fali007

related Issue for feature request in traceloop/openllmetry to enable observability for MCP Server, https://github.com/traceloop/openllmetry/issues/2662

hk-bmi avatar Apr 03 '25 14:04 hk-bmi

I'm now trying to inject a tracer for each request by defining the SofaTracerMiddleware(which inherits BaseHTTPMiddleware) and accessing the above plugin for the mcp server application via add_middleware.

class SofaTracerMiddleware(BaseHTTPMiddleware):
    def __init__(self, app):
        super().__init__(app)

    @property
    def tracer(self):
        return opentracing.tracer

    async def dispatch(self, request, call_next):
        headers = request.headers
        # Extract the OpenTracing context from the incoming request headers
        try:
            input_context = self.tracer.extract(format=Format.HTTP_HEADERS, carrier=headers)
        except opentracing.InvalidCarrierException:
            input_context = None
        # Start a new span for the incoming request
        with self.tracer.start_active_server_span(SPAN_CODE_HTTP_SERVER, context=input_context) as scope:
            span = scope.span
            span.url = str(request.url)
            span.method = request.method.upper()
            span.request_size = request.headers.get("content-length", 0)

            response = await call_next(request)

            span.result_code = str(response.status_code)
            span.response_size = response.headers.get("content-length", 0)
            current_context = _SCOPE.get()
            if current_context:
                print(f"Current tracer in worker: {current_context}")
            else:
                print("No active tracer in this context.")
        return response

app = FastMCP()
app.sse_app().add_middleware(SofaTracerMiddleware)

But I found that the requests about message used the same tracer from the initialise method with url messages, and the subsequent initialize and tools methods didn't use the new tracer

wenxuwan avatar Apr 07 '25 06:04 wenxuwan

Image

Image

It is also clear from the logs that the ContextVarsScope used for opentracing each time a request is processed is that of the first request, even though handle_post_message and SofaTracerMiddleware use the latest ContextVarsScope for each request.

Image

So now do I have a way to get my own async ContextVarsScope for each request?

wenxuwan avatar Apr 08 '25 02:04 wenxuwan

Hi @wenxuwan, I tried your approach and I don't think it will work because if you choose http header to propagate trace context, only the initialize request has http header. That's why you see the same parent for all subsequent requests as well.

I was tinkering with the traceloop way of passing context by intercepting and wrapping over _handle_request function in mcp.server.lowlevel.server.py for server side and send_request function in mcp.shared.session.py for client side tracing. I am trying to pass traceparent in params.meta.traceparent of the RequestParams from types.py. I am able to map correct parent - child span relationship.

Screenshot shows how the context is propagated from Agent application to MCP servers back and forth. Here parent - child span ID is correct. Image

fali007 avatar Apr 08 '25 04:04 fali007

Hi @wenxuwan, I tried your approach and I don't think it will work because if you choose http header to propagate trace context, only the initialize request has http header. That's why you see the same parent for all subsequent requests as well.

I was tinkering with the traceloop way of passing context by intercepting and wrapping over _handle_request function in mcp.server.lowlevel.server.py for server side and send_request function in mcp.shared.session.py for client side tracing. I am trying to pass traceparent in params.meta.traceparent of the RequestParams from types.py. I am able to map correct parent - child span relationship.

Screenshot shows how the context is propagated from Agent application to MCP servers back and forth. Here parent - child span ID is correct. Image

You are right, but not quite the same as my problem. Right now I'm not passing the tracer inside my http request. every time I make a request

with self.tracer.start_active_server_span(SPAN_CODE_HTTP_SERVER, context=input_context) as scope 

The new contextVar is generated inside the current async, but now only one async has been processing the user's request, so the tracer is the Contextvar of the first request. That's why handle_post_message gets the newest tracer every time, but when processing the request is using the tracer of the first request.

wenxuwan avatar Apr 08 '25 05:04 wenxuwan

Adding some links: https://github.com/modelcontextprotocol/modelcontextprotocol/issues/246 - request to add tracing to MCP spec https://github.com/modelcontextprotocol/modelcontextprotocol/pull/414 - fix for the spec to use params._meta for trace context propagation such as params._meta.traceparent https://github.com/open-telemetry/semantic-conventions/pull/2083 - semantic convention proposal for MCP traces and metrics.

samsp-msft avatar Apr 28 '25 23:04 samsp-msft

for the context propagation part, you can have a look at @anuraaga's code in openinference or if such a change is welcome maybe he can help raise it. I don't know the SDK specific policies about an otel dependency, but as python is flexible with imports maybe it is fine to just add it directly?

codefromthecrypt avatar May 01 '25 01:05 codefromthecrypt

Workaround for the problem of ContextVars being lost in the async streams preventing other forms of context propagation from working: https://github.com/pydantic/logfire/issues/1459

alexmojaki avatar Oct 06 '25 14:10 alexmojaki

I’d like to +1 this and suggest a slightly more structured approach that could make observability easier to adopt across MCP deployments.

Concretely, it might help to:

  1. Add a small instrumentation interface at the SDK level, e.g. something like:

    class Instrumenter(Protocol):
        def on_request_start(self, request_id: str, meta: RequestMeta) -> None: ...
        def on_request_end(self, request_id: str, result: ResultMeta) -> None: ...
        def on_error(self, request_id: str, exc: BaseException) -> None: ...
    
  • This would be pluggable from both server and client.
  • A default "no-op" instrumenter keeps behavior unchanged for users who don’t opt in.
  1. Thread request IDs into logs / spans

    • Wherever request_id is available in RequestContext, ensure it is consistently passed to the instrumenter.
    • For logging, we could use logging’s extra={...} to include request_id as a structured field.
  2. Provide a small OpenTelemetry adapter as an optional extra

    • Implement an OpenTelemetryInstrumenter in a separate module (or extra) that:

      • creates spans per MCP request/tool call,
      • emits basic metrics (latency, error counts) for tools/resources/prompts.
    • This would align well with the use case described in this issue.

This keeps the core SDK decoupled from any particular telemetry stack, while making it straightforward to plug in OpenTelemetry, Sentry, or other systems via a small adapter.

If maintainers are open to this direction, I’d be happy to help draft a more concrete design or contribute an initial implementation.

dgenio avatar Nov 28 '25 12:11 dgenio

Noticed this since happen to be mentioned here a long time ago but not a maintainer. But as a general instrumenter, want to note that API can't work unless

  • instrument-start returns a value that is passed to instrument_end
  • it's one instrument function accepting a next type of function

Generally I'd recommend the latter. Langchain was stuck for going with the former with no return value. Now I can't open this link for whatever reason.

https://github.com/langchain-ai/langchain/discussions/27954

anuraaga avatar Nov 28 '25 13:11 anuraaga

OTel python uses a ContextVar to propagate the trace/span context. A ContextVar is supposed to be a per-thread and per-task variable.. Is the issue that the MCP library just uses 1 task per-connection instead of per-request, so we get the ContextVars from when the connection started ?

Or is it possibly due to where the task is getting it's context from ? According to this a anyio.Task gets it's context from the anyio.Task that call's it's start method..

Withasyncio.Task you can explicitly pass a Context..

DylanRussell avatar Dec 10 '25 14:12 DylanRussell