opentelemetry-python-contrib icon indicating copy to clipboard operation
opentelemetry-python-contrib copied to clipboard

FastApi auto-instrumentation send extra spans for a single API execution

Open RunzhongHuang opened this issue 3 years ago • 21 comments

Hi open-telemetry community,

I have a python project with fastapi auto-instrumentation which export the spans to zipkin server, and I have a simple "GET" API which returns a "hello". But I notice it generates 3 spans instead of 1.

image

As you can see, the top one is related to the request I send out. But the other 2 child spans, I am not sure why they are there.

Describe your environment python = 3.8 opentelemetry-api = 1.7.1 opentelemetry-sdk = 1.7.1 opentelemetry-exporter-zipkin = 1.7.1 opentelemetry-instrumentation-fastapi = 0.26b1 fastapi = 0.70.0 uvicorn = 0.15.0

Steps to reproduce

from opentelemetry import trace
from opentelemetry.exporter.zipkin.json import ZipkinExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import uvicorn

app = FastAPI()

tracer = trace.get_tracer(__name__)

zipkin_exporter = ZipkinExporter(
    endpoint="https://bolt-msl-zipkin-server.eks-dev-us-east-1.container.spglobal.com:30443/api/v2/spans",
)
span_processor = SimpleSpanProcessor(zipkin_exporter)
trace_provider = TracerProvider(resource=Resource.create({"service.name": "open-telemetry-test"}))
trace_provider.add_span_processor(span_processor)

@app.get('/opentelemetry-fastapi')
def about():
   return "hello"


FastAPIInstrumentor.instrument_app(app,
                                   tracer_provider=trace_provider)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Here is the REST request I made:

curl --location --request GET 'http://localhost:8000/opentelemetry-fastapi'

Here is the JSON span metadata:

[{
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"parentId": "76122e06995e41b6",
	"id": "a920c88a2a64c8e2",
	"name": "/opentelemetry-fastapi http send",
	"timestamp": 1639091853270289,
	"duration": 1002,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1",
		"type": "http.response.body"
	},
	"debug": true
}, {
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"id": "76122e06995e41b6",
	"kind": "SERVER",
	"name": "/opentelemetry-fastapi",
	"timestamp": 1639091853034614,
	"duration": 289682,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"correlation-id": "111222333444555",
		"http.flavor": "1.1",
		"http.host": "127.0.0.1:8000",
		"http.method": "GET",
		"http.route": "/opentelemetry-fastapi",
		"http.scheme": "http",
		"http.server_name": "localhost:8000",
		"http.status_code": "200",
		"http.target": "/opentelemetry-fastapi",
		"http.url": "http://127.0.0.1:8000/opentelemetry-fastapi",
		"http.user_agent": "PostmanRuntime/7.28.4",
		"net.host.port": "8000",
		"net.peer.ip": "127.0.0.1",
		"net.peer.port": "55402",
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"region": "us-east-1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1"
	},
	"debug": true
}, {
	"traceId": "7e5511d0c28934bf480bc7f28ba2d34d",
	"parentId": "76122e06995e41b6",
	"id": "edd83d58575f0811",
	"name": "/opentelemetry-fastapi http send",
	"timestamp": 1639091853053617,
	"localEndpoint": {
		"serviceName": "open-telemetry-test"
	},
	"tags": {
		"http.status_code": "200",
		"otel.library.name": "opentelemetry.instrumentation.asgi",
		"otel.library.version": "0.26b1",
		"service.name": "open-telemetry-test",
		"telemetry.sdk.language": "python",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.1",
		"type": "http.response.start"
	},
	"debug": true
}]

What is the expected behavior? Exactly one span related to the API get executed.

What is the actual behavior? Besides the span relates to the API got executed, there are 2 extra spans are generated.

Additional context

I am suspecting this is related to the ASGI interfaces since the extra span has "send" in it which related to sending response back. But shouldn't it be a single "send" instead two?

I feel like there could some missing link regarding certain concept. I am looking forward to hearing from your suggestions.

Thanks for your time!

RunzhongHuang avatar Dec 10 '21 05:12 RunzhongHuang

i'm also experiencing the same issue when using fastapi + xray.

image

mohdnr avatar Dec 19 '21 16:12 mohdnr

At first glance it seems to be the case because of the async nature of FastAPI. If you look at type attribute, it hints that two distinct events are being recorded.

		"type": "http.response.start"

and

		"type": "http.response.body"

It appears that first span represents sending back headers or initiating a response while the second one represending sending the response body.

owais avatar Dec 20 '21 08:12 owais

Just reviewed my logs and i'm seeing what you mean. Is the practise to tie all of these together downstream where you're storing your data ?

type: STRING(http.response.start)
type: STRING(http.response.start)
type: STRING(http.request)
type: STRING(http.response.body)
type: STRING(http.response.body)
type: STRING(http.response.body)
type: STRING(http.disconnect)

mohdnr avatar Dec 21 '21 15:12 mohdnr

They are all part of the same trace so generally backends do store them together or close to each other if that is what you meant.

owais avatar Dec 21 '21 18:12 owais

They are all part of the same trace so generally backends do store them together or close to each other if that is what you meant.

That makes sense, thanks for your help! I see them all grouped when i added Xray as a data source in Grafana.

mohdnr avatar Dec 21 '21 18:12 mohdnr

if you want disable them , we would need that -> https://github.com/open-telemetry/opentelemetry-python-contrib/pull/815

raphaelauv avatar Dec 29 '21 10:12 raphaelauv

@owais thanks for the explanation about these types, and I totally understand why the internal asgi interactions are captured within the same trace (e.g., debugging purpose).

I think the reason I raise this question is because these extra spans become confusing in the zipkin GUI (in the example I provided only has 2 extra spans, however in one of my another project it generated more estra since it not only serves the incoming request, but also makes external requests). And I would like to know if there is a way to suppress these 'internal subtle' spans in order to have a cleaner service trace chain? Or the way I coded is not correct.

@raphaelauv thank you very much! I will keep an eye on the status of the PR #815

RunzhongHuang avatar Jan 04 '22 20:01 RunzhongHuang

Hi @raphaelauv , I noticed the PR #815 is closed without merge. I assume that meaning this little patch is not available in the release. Then I would like to ask what could be the other possible approachs to disabled these internal spans?

RunzhongHuang avatar Mar 08 '22 15:03 RunzhongHuang

I stumbled accross the same issue and managed to get rid of these extra spans fairly easily by overriding two methods in the OpenTelemetryMiddleware:

from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware

class MyOpenTelemetryMiddleware(OpenTelemetryMiddleware):

    def _get_otel_receive(self, server_span_name, scope, receive):
        return receive

    def _get_otel_send(self, server_span, server_span_name, scope, send):
        @wraps(send)
        async def otel_send(message):
            if message["type"] == "http.response.start":
                set_status_code(server_span, message["status"])
            elif message["type"] == "websocket.send":
                set_status_code(server_span, 200)

            propagator = get_global_response_propagator()
            if propagator:
                propagator.inject(
                    message,
                    context=set_span_in_context(
                        server_span, trace.context_api.Context()
                    ),
                    setter=asgi_setter,
                )

            await send(message)

        return otel_send

# in create_app()
app = Quart(__name__)
app.asgi_app = MyOpenTelemetryMiddleware(app.asgi_app)

nils-borrmann-y42 avatar May 03 '22 15:05 nils-borrmann-y42

I stumbled accross the same issue and managed to get rid of these extra spans fairly easily by overriding two methods in the OpenTelemetryMiddleware:

class MyOpenTelemetryMiddleware(OpenTelemetryMiddleware):

    def _get_otel_receive(self, server_span_name, scope, receive):
        return receive

    def _get_otel_send(self, server_span, server_span_name, scope, send):
        @wraps(send)
        async def otel_send(message):
            if message["type"] == "http.response.start":
                set_status_code(server_span, message["status"])
            elif message["type"] == "websocket.send":
                set_status_code(server_span, 200)

            propagator = get_global_response_propagator()
            if propagator:
                propagator.inject(
                    message,
                    context=set_span_in_context(
                        server_span, trace.context_api.Context()
                    ),
                    setter=asgi_setter,
                )

            await send(message)

        return otel_send

@nils-borrmann-y42 thanks for the workaround, it works perfectly. The CancelledError Exception issue #467 in internal span under python 3.8 could also fixed with this workaround.

blueswen avatar May 04 '22 15:05 blueswen

Hi @Blueswen and @nils-borrmann-y42. Thanks for your reply and confirmation in this issue. I have a quick question regard this overrided "OpenTelemetryMiddleware".

The "FastAPIInstrumentor" class didn't expose any parameters to pass "OpenTelemetryMiddleware" object. It directly add the "OpenTelemetryMiddleware" as the middle ware. May I ask how you utilize your "MyOpenTelemetryMiddleware" ? Is there anywhere that missed here?

image

RunzhongHuang avatar May 24 '22 18:05 RunzhongHuang

This gist is my workaround, which copy the entire FastAPIInstrumentor Code then add custom middleware (line from 167) and import some required modules (line from 148).

@RunzhongHuang , as you said, there is no parameters to pass "OpenTelemetryMiddleware" object. So I thinks this is the only way to do the hack.

blueswen avatar May 25 '22 00:05 blueswen

I actually only used it for the quart framework, where the OpenTelemetryMiddleware is used directly. Looks like it is a bit more complicated for fastapi.

nils-borrmann-y42 avatar May 25 '22 15:05 nils-borrmann-y42

Thank you @Blueswen and @nils-borrmann-y42! I thought the same and seem no other way around, let me test it out on my project.

RunzhongHuang avatar May 25 '22 21:05 RunzhongHuang

This feature of creating a span for the initial ASGI scope plus the send/receive events is simultaneously really helpful (I used this to discover some requests were timing out after receive but before send) but also really noisy (multipart messages send one receive span for each packet received, which can get really noisy when a single trace has hundreds of spans. It would be lovely to have more configurability as to turning these internal spans on/off, or controlling when they trigger.

xkortex avatar Sep 29 '22 16:09 xkortex

Thumbs up for add more configuration. I need to use the hack that @blueswen shared. Looking forward to add it as a configuration parameter. It seems a very quick and small fix, but ticket is open for over a year now

ehsannm avatar Oct 21 '22 07:10 ehsannm

You are always welcome to send a fix :)

srikanthccv avatar Oct 21 '22 14:10 srikanthccv

@srikanthccv Someone already did and the PR was rejected: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/815

nils-borrmann-y42 avatar Oct 22 '22 20:10 nils-borrmann-y42

It wasn't rejected. It was closed by author without any further explanation. I don't know why they decided to do that. One can drop the spans by other means, such as as custom processor or sampler but I think the configurability on instrumentation is also fine especially because dropping span which is used for propagation breaks the tree structure of trace.

srikanthccv avatar Oct 23 '22 03:10 srikanthccv

I did it by implementing SpanProcessor

by overriding on_end of BatchSpanProcessor. skip when span kind is INTERNAL and attributes.type is in (http.request, http.response.start, http.response.body)

    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

Full Code

from opentelemetry.sdk.trace import ReadableSpan
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import SpanKind


class MySpanProcessor(BatchSpanProcessor):
    def on_end(self, span: ReadableSpan) -> None:
        if span.kind == SpanKind.INTERNAL and (
                span.attributes.get('type', None) in ('http.request',
                                                      'http.response.start',
                                                      'http.response.body')
        ):
            return
        super().on_end(span=span)

and attach it to provider

provider = TracerProvider(resource=resource)
processor = MySpanProcessor(span_exporter=exporter)
provider.add_span_processor(span_processor=processor)

outstanding1301 avatar Oct 20 '23 04:10 outstanding1301

any updates on this?

nirfuzz avatar Apr 21 '24 13:04 nirfuzz