gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Gunicorn server startup failure messes up signal handling and affects further calls to `subprocess.run`

Open winwinashwin opened this issue 2 years ago • 1 comments

Summary

  1. Start a custom gunicorn application with invalid port (say -1), catch and ignore the OverflowError raised
  2. Any subsequent calls to subprocess.run will always return exit code 0 regardless of the command passed to subprocess

MRE

from gunicorn.app.base import BaseApplication
import subprocess

# begin section: gunicorn
class GunicornApp(BaseApplication):
    def __init__(self, app, **options):
        self.options = options
        self.wsgi_app = app
        super().__init__()

    def load_config(self):
        for key, value in self.options.items():
            self.cfg.set(key.lower(), value)

    def load(self):
        return self.wsgi_app


def app(environ, start_response):
    data = b'Hello, World!\n'
    status = '200 OK'
    response_headers = [
        ('Content-type', 'text/plain'),
        ('Content-Length', str(len(data)))
    ]
    start_response(status, response_headers)
    return iter([data])


port = -1
server = GunicornApp(app, bind=f"localhost:{port}")
try:
    server.run()  # Expected failure
except OverflowError:
    pass
# end section: gunicorn

proc = subprocess.run("exit 1", shell=True, capture_output=True, text=True)
assert proc.returncode == 1, f"Got returncode = {proc.returncode}"

Environment

OS: Ubuntu 20.04.6

$ python --version
Python 3.8.10

$ pip freeze | grep gunicorn
gunicorn==21.2.0

Attempts at finding the root cause

I suspect this might have something to do with the improper cleanup of the signal handling steps done in init_handlers of arbiter.py

I generated function call traces of the above MRE with and without the gunicorn section and did a simple vimdiff

python -m trace --trace main.py 2>/dev/null > trace.out

subprocess.run internally does a os.waitpid(pid, wait_flags) call. Here pid is supposed to be the process ID of the command it executes (exit 1 in the MRE case). However in the MRE scenario above, the pid passed to waitpid is the gunicorn worker pid, which no longer exists. os.waitpid raises a ChildProcessError and subprocess sets the exit code to 0 when handling the exception.

Maybe related issue: https://github.com/encode/uvicorn/issues/894

winwinashwin avatar Feb 10 '24 17:02 winwinashwin

I think this is might be solved by #3148, at least with the change-set that I primarily propose. At I didn't get the assert when trying it out with that.

sylt avatar Feb 11 '24 19:02 sylt