dd-trace-py
dd-trace-py copied to clipboard
Aiohttp + asyncio tracing
Hi,
I have an aiohttp API with some async code inside that I'm trying to trace with Datadog. I'm using documentation for aiohttp for tracing: trace_app(app, tracer, service="my-api")
The problem that I encounter is that DD trace shows incorrect execution time as well as status 200 all the time. Moreover, all logs after the first await are not tagged with dd.trace_id and dd.span_id. I assume that the execution time is also measured until the first await.
For example, a curl to the endpoint that returns a file can take around 30 seconds and sometimes ends up with a timeout exception, while Datadog always reports a few ms exec time and code 200.
Below is the problematic endpoint:
class DownloadLogsView(BaseView):
"""Download logs."""
async def get(self):
log.info("Starting download") # This log has correct dd.trace_id and dd.span_id
app_id = self.request.match_info.get("app_id")
client = self.request.app.get("elastic")
filename = f"logs_{app_id}.txt"
response = web.StreamResponse(
headers={
hdrs.CONTENT_TYPE: "text/plain; charset=utf-8",
hdrs.CACHE_CONTROL: "no-cache",
hdrs.CONTENT_DISPOSITION: f"attachment; filename={filename}",
hdrs.TRANSFER_ENCODING: "chunked",
}
)
await response.prepare(self.request)
### That is where trace and span ids are "lost" ###
log.info(f"Download logs request for app: {app_id}") # This log has dd.trace_id=0 and dd.span_id=0
async for logline in es_client.scan_app_logs(client, app_id): # Same for all logs from es_client
await response.write(logline.encode("utf-8"))
return response
Which version of dd-trace-py are you using?
ddtrace==0.40.0
Which version of the libraries are you using?
aiohttp==3.6.2
elasticsearch[async]==7.8.0
python-json-logger==v0.1.10
gunicorn==20.0.4
How can we reproduce your problem?
The problematic endpoint is provided above.
What is the result that you get?
Incomplete traces.
What is the result that you expected?
Traces with correct execution time and status codes, and corresponding logs to have correct trace ids.
async in general seems to be broken or not supported in the datadog libraries. Half my traces in tornado are not getting registered and they just show 1 span, 1 execution time.
Unfortunately the current implementation of aiohttp tracing finishes the span on a prepare call. So it doesn't really support StreamResponse.
@palazzem ddtrace needs to hook after on_prepare, unfortunately there isn't a good hook for this, the best I've seen is that each request has it's own asyncio task, so theoretically it should close the trace when the task is finished (add_done_callback). Thoughts? I could try to come up with something.
Apologies for the very long response time from our team on this ticket.
This is an issue we'd like to fix, and it's been in our bugfix backlog for a while. If I understand correctly, the most promising fix involves calling add_done_callback on the asyncio task underlying an individual aiohttp request. This behavior could potentially be released behind a feature flag if it changes the behavior of existing synchronous tracing code.