dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

Aiohttp + asyncio tracing

Open alinagol opened this issue 4 years ago • 2 comments

Hi,

I have an aiohttp API with some async code inside that I'm trying to trace with Datadog. I'm using documentation for aiohttp for tracing: trace_app(app, tracer, service="my-api")

The problem that I encounter is that DD trace shows incorrect execution time as well as status 200 all the time. Moreover, all logs after the first await are not tagged with dd.trace_id and dd.span_id. I assume that the execution time is also measured until the first await.

For example, a curl to the endpoint that returns a file can take around 30 seconds and sometimes ends up with a timeout exception, while Datadog always reports a few ms exec time and code 200.

Below is the problematic endpoint:

class DownloadLogsView(BaseView):
    """Download logs."""

    async def get(self):

        log.info("Starting download")  # This log has correct dd.trace_id and dd.span_id

        app_id = self.request.match_info.get("app_id")
        client = self.request.app.get("elastic")

        filename = f"logs_{app_id}.txt"
        response = web.StreamResponse(
            headers={
                hdrs.CONTENT_TYPE: "text/plain; charset=utf-8",
                hdrs.CACHE_CONTROL: "no-cache",
                hdrs.CONTENT_DISPOSITION: f"attachment; filename={filename}",
                hdrs.TRANSFER_ENCODING: "chunked",
            }
        )
        await response.prepare(self.request)
       
       ### That is where trace and span ids are "lost" ###

        log.info(f"Download logs request for app: {app_id}")  # This log has dd.trace_id=0 and dd.span_id=0

        async for logline in es_client.scan_app_logs(client, app_id):  # Same for all logs from es_client
            await response.write(logline.encode("utf-8"))

        return response

Which version of dd-trace-py are you using?

ddtrace==0.40.0

Which version of the libraries are you using?

aiohttp==3.6.2
elasticsearch[async]==7.8.0
python-json-logger==v0.1.10
gunicorn==20.0.4

How can we reproduce your problem?

The problematic endpoint is provided above.

What is the result that you get?

Incomplete traces.

What is the result that you expected?

Traces with correct execution time and status codes, and corresponding logs to have correct trace ids.

alinagol avatar Aug 21 '20 13:08 alinagol

async in general seems to be broken or not supported in the datadog libraries. Half my traces in tornado are not getting registered and they just show 1 span, 1 execution time.

zikphil avatar Jan 05 '21 16:01 zikphil

Unfortunately the current implementation of aiohttp tracing finishes the span on a prepare call. So it doesn't really support StreamResponse.

mklokocka avatar Feb 18 '21 17:02 mklokocka

@palazzem ddtrace needs to hook after on_prepare, unfortunately there isn't a good hook for this, the best I've seen is that each request has it's own asyncio task, so theoretically it should close the trace when the task is finished (add_done_callback). Thoughts? I could try to come up with something.

thehesiod avatar Nov 28 '22 22:11 thehesiod

Apologies for the very long response time from our team on this ticket.

This is an issue we'd like to fix, and it's been in our bugfix backlog for a while. If I understand correctly, the most promising fix involves calling add_done_callback on the asyncio task underlying an individual aiohttp request. This behavior could potentially be released behind a feature flag if it changes the behavior of existing synchronous tracing code.

emmettbutler avatar Nov 06 '23 15:11 emmettbutler