dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

Harakiri & uwsgi

Open wojcikm opened this issue 2 years ago • 8 comments

We run django app in kubernetes and use this library but we are unable to get logs from errors when harakiri occurs - therefore no logs with 502|504 in tracing. It looks like that worker is killed before it gets a chance to send information via library to dd-agent which is running on cluster. Is it doable to pass this information before worker is killed? We tried several options (eg harakiri-no-arh=true), but none of them worked.

uwsgi==2.0.20
uwsgitop==0.11
Django==4.0.4
python:3.10.4
ddtrace==0.52.1
[uwsgi]
    stats = :2999
    master = True
    processes = 8
    chdir = /opt/
    disable-logging = true
    log-4xx = false
    log-5xx = true
    log-slow = 2000
    buffer-size = 30000
    harakiri = 8
    harakiri-verbose = true
    worker-reload-mercy = 30
    vacuum = True
    single-interpreter = True
    need-app = True
    listen = 120
    close-on-exec = True
    wsgi-file = service/run.py
    ignore-sigpipe = True
    ignore-write-errors = True
    enable-threads = True
    threads = 2
    die-on-term = True
    lazy-apps = True
    thunder-lock = True
    backtrace-depth = 1000
    cache2 = name=business_service_cache,bitmap=1,keysize=2048,maxitems=1000
    py-tracebacker = /tmp/uwsgi-tracebacker-socket
    master-fifo = /tmp/fifofile
    enable-metrics = true
    plugin = dogstatsd
    stats-push = dogstatsd:$(DD_AGENT_HOST):8125,booksy.api.uwsgi

wojcikm avatar Jun 03 '22 09:06 wojcikm

hey @wojcikm thank you for opening the issue, and sorry you are seeing this problem.

Our library doesn't register any after request hooks with uWSGI so harakiri-no-arh won't have any impact. That option also determines whether after request hooks should be counted towards the harakiri time or not, and not whether they still get called when harakiri occurs.

Unfortunately uWSGI does a SIGKILL on any worker than exceeds it's harakiri time which means we don't have any opportunity to do a final close/flush of any spans :(

There is this related open issue on uWSGI: https://github.com/unbit/uwsgi/pull/2311

brettlangdon avatar Jun 03 '22 13:06 brettlangdon

@brettlangdon uwsgi supports graceful shutdown, so how to add shutdown hook on ddtrace?

Liubey avatar Apr 14 '23 02:04 Liubey

@Liubey I am not familiar with this uwsgi hook, do you have docs or an example usage for it?

brettlangdon avatar Apr 14 '23 13:04 brettlangdon

@brettlangdon I'm confused about this too😂. Try to search uwsgi/django docs, but nothing. Just seeking for advice.

Liubey avatar Apr 17 '23 08:04 Liubey

It looks like https://github.com/unbit/uwsgi/pull/2523 added some additional configuration options for graceful shutdown timeouts to give the application a chance to cleanup before exiting.

But this isn't shipped yet, it will be added in 2.0.22 it seems.

brettlangdon avatar Apr 17 '23 15:04 brettlangdon

@brettlangdon Hi, thank you for your comments. But I think just "graceful shutdown timeouts" can not help us to finish span or flush reporter. Maybe we need catch the "harakiri signal" or "callback when the worker killed", just like JVM's shutdown hook.

Liubey avatar Apr 18 '23 02:04 Liubey

I've managed to get this working in a Django application using v2.0.22 where a SIGTERM handler is registered in middleware. Middleware is used so the request object can be bound to the handler function.

Looks a bit like this:

import functools
import signal
import threading
from typing import Any, Callable

import ddtrace
from ddtrace import constants as ddtrace_constants
from ddtrace.contrib.django import utils as ddtrace_django_utils
from django import http


def _run_graceful_shutdown(request: http.HttpRequest, signum: int, frame: Any) -> None:
    """
    Attempt to gracefully shutdown the application before uWSGI kills the process.
    """
    root_span = ddtrace.tracer.current_root_span()
    if root_span is not None:
        root_span.error = 1
        root_span.set_tag_str(ddtrace_constants.ERROR_MSG, "uWSGI graceful shutdown")
        root_span.set_tag_str(ddtrace_constants.ERROR_TYPE, "uWSGI timeout")

        ddtrace_django_utils._after_request_tags( 
            # Pin isn't used in `_after_request_tags` so we don't pass one. Am not sure how to
            # load/find the appropriate pin object anyhow.
            pin=None,  
            span=root_span,
            request=request,
            # This response won't actually be returned by the application, but this is what a
            # reverse proxy like Nginx will return if the application doesn't respond in time,
            # which is the typical behaviour when uWSGI kills the process.
            response=http.HttpResponse(status=504),
        )

    current_span = ddtrace.tracer.current_span()
    if current_span is not None:
        current_span.finish_with_ancestors()


class GracefulShutdown:

    def __init__(self, get_response: Callable[[http.HttpRequest], http.HttpResponse]) -> None:
        self.get_response = get_response

    def __call__(self, request: http.HttpRequest) -> http.HttpResponse:
        # Only register signal handler if in main thread.
        if threading.current_thread() is threading.main_thread():
            # Bind the request object to the handler.
            handler = functools.partial(_run_graceful_shutdown, request)
            signal.signal(signal.SIGTERM, handler)

        return self.get_response(request)

and uwsgi.ini:

[uwsgi]
strict = true

master = true
enable-threads = true
single-interpreter = true

# Specify what module to run.
chdir = /app
module = path.to.wsgi:application

# Specify what host:port to bind to.
http = 0.0.0.0:8000

processes = 4
threads = 1

# Slow request handling.
harakiri = 10
harakiri-graceful-timeout = 4

# Configuration for Datadog.
lazy-apps = true
import = ddtrace.bootstrap.sitecustomize

Harakiri-ed requests get logged in Datadog as a 504 response.

I'm sure this implementation can be improved though!

For one, I couldn't get the signal handler to be called consistently with threads > 1. TBH, I'm don't fully understand the interplay between uWSGI's processes, threads and OS signals - and eventually gave up and set threads = 1.

Will update if I get a better understanding.

codeinthehole avatar Sep 09 '23 11:09 codeinthehole

@codeinthehole maybe you need get trace span from "Request", not get current_root_span. In multi-thread context, you may get root_span not of the current request?

By the way, what default value for threads. I can not find from the docs.

Liubey avatar Oct 13 '23 08:10 Liubey

This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.

github-actions[bot] avatar May 09 '24 00:05 github-actions[bot]