dd-trace-py
dd-trace-py copied to clipboard
Harakiri & uwsgi
We run django app in kubernetes and use this library but we are unable to get logs from errors when harakiri occurs - therefore no logs with 502|504 in tracing. It looks like that worker is killed before it gets a chance to send information via library to dd-agent which is running on cluster. Is it doable to pass this information before worker is killed? We tried several options (eg harakiri-no-arh=true
), but none of them worked.
uwsgi==2.0.20
uwsgitop==0.11
Django==4.0.4
python:3.10.4
ddtrace==0.52.1
[uwsgi]
stats = :2999
master = True
processes = 8
chdir = /opt/
disable-logging = true
log-4xx = false
log-5xx = true
log-slow = 2000
buffer-size = 30000
harakiri = 8
harakiri-verbose = true
worker-reload-mercy = 30
vacuum = True
single-interpreter = True
need-app = True
listen = 120
close-on-exec = True
wsgi-file = service/run.py
ignore-sigpipe = True
ignore-write-errors = True
enable-threads = True
threads = 2
die-on-term = True
lazy-apps = True
thunder-lock = True
backtrace-depth = 1000
cache2 = name=business_service_cache,bitmap=1,keysize=2048,maxitems=1000
py-tracebacker = /tmp/uwsgi-tracebacker-socket
master-fifo = /tmp/fifofile
enable-metrics = true
plugin = dogstatsd
stats-push = dogstatsd:$(DD_AGENT_HOST):8125,booksy.api.uwsgi
hey @wojcikm thank you for opening the issue, and sorry you are seeing this problem.
Our library doesn't register any after request hooks with uWSGI so harakiri-no-arh
won't have any impact. That option also determines whether after request hooks should be counted towards the harakiri time or not, and not whether they still get called when harakiri occurs.
Unfortunately uWSGI does a SIGKILL on any worker than exceeds it's harakiri time which means we don't have any opportunity to do a final close/flush of any spans :(
There is this related open issue on uWSGI: https://github.com/unbit/uwsgi/pull/2311
@brettlangdon uwsgi supports graceful shutdown, so how to add shutdown hook on ddtrace?
@Liubey I am not familiar with this uwsgi hook, do you have docs or an example usage for it?
@brettlangdon I'm confused about this too😂. Try to search uwsgi/django docs, but nothing. Just seeking for advice.
It looks like https://github.com/unbit/uwsgi/pull/2523 added some additional configuration options for graceful shutdown timeouts to give the application a chance to cleanup before exiting.
But this isn't shipped yet, it will be added in 2.0.22 it seems.
@brettlangdon Hi, thank you for your comments. But I think just "graceful shutdown timeouts" can not help us to finish span or flush reporter. Maybe we need catch the "harakiri signal" or "callback when the worker killed", just like JVM's shutdown hook.
I've managed to get this working in a Django application using v2.0.22 where a SIGTERM
handler is registered in middleware. Middleware is used so the request object can be bound to the handler function.
Looks a bit like this:
import functools
import signal
import threading
from typing import Any, Callable
import ddtrace
from ddtrace import constants as ddtrace_constants
from ddtrace.contrib.django import utils as ddtrace_django_utils
from django import http
def _run_graceful_shutdown(request: http.HttpRequest, signum: int, frame: Any) -> None:
"""
Attempt to gracefully shutdown the application before uWSGI kills the process.
"""
root_span = ddtrace.tracer.current_root_span()
if root_span is not None:
root_span.error = 1
root_span.set_tag_str(ddtrace_constants.ERROR_MSG, "uWSGI graceful shutdown")
root_span.set_tag_str(ddtrace_constants.ERROR_TYPE, "uWSGI timeout")
ddtrace_django_utils._after_request_tags(
# Pin isn't used in `_after_request_tags` so we don't pass one. Am not sure how to
# load/find the appropriate pin object anyhow.
pin=None,
span=root_span,
request=request,
# This response won't actually be returned by the application, but this is what a
# reverse proxy like Nginx will return if the application doesn't respond in time,
# which is the typical behaviour when uWSGI kills the process.
response=http.HttpResponse(status=504),
)
current_span = ddtrace.tracer.current_span()
if current_span is not None:
current_span.finish_with_ancestors()
class GracefulShutdown:
def __init__(self, get_response: Callable[[http.HttpRequest], http.HttpResponse]) -> None:
self.get_response = get_response
def __call__(self, request: http.HttpRequest) -> http.HttpResponse:
# Only register signal handler if in main thread.
if threading.current_thread() is threading.main_thread():
# Bind the request object to the handler.
handler = functools.partial(_run_graceful_shutdown, request)
signal.signal(signal.SIGTERM, handler)
return self.get_response(request)
and uwsgi.ini
:
[uwsgi]
strict = true
master = true
enable-threads = true
single-interpreter = true
# Specify what module to run.
chdir = /app
module = path.to.wsgi:application
# Specify what host:port to bind to.
http = 0.0.0.0:8000
processes = 4
threads = 1
# Slow request handling.
harakiri = 10
harakiri-graceful-timeout = 4
# Configuration for Datadog.
lazy-apps = true
import = ddtrace.bootstrap.sitecustomize
Harakiri-ed requests get logged in Datadog as a 504 response.
I'm sure this implementation can be improved though!
For one, I couldn't get the signal handler to be called consistently with threads > 1
. TBH, I'm don't fully understand the interplay between uWSGI's processes, threads and OS signals - and eventually gave up and set threads = 1
.
Will update if I get a better understanding.
@codeinthehole maybe you need get trace span from "Request", not get current_root_span. In multi-thread context, you may get root_span not of the current request?
By the way, what default value for threads. I can not find from the docs.
This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.