Gunicorn server startup failure messes up signal handling and affects further calls to `subprocess.run`
Summary
- Start a custom gunicorn application with invalid port (say -1), catch and ignore the
OverflowErrorraised - Any subsequent calls to
subprocess.runwill always return exit code 0 regardless of the command passed to subprocess
MRE
from gunicorn.app.base import BaseApplication
import subprocess
# begin section: gunicorn
class GunicornApp(BaseApplication):
def __init__(self, app, **options):
self.options = options
self.wsgi_app = app
super().__init__()
def load_config(self):
for key, value in self.options.items():
self.cfg.set(key.lower(), value)
def load(self):
return self.wsgi_app
def app(environ, start_response):
data = b'Hello, World!\n'
status = '200 OK'
response_headers = [
('Content-type', 'text/plain'),
('Content-Length', str(len(data)))
]
start_response(status, response_headers)
return iter([data])
port = -1
server = GunicornApp(app, bind=f"localhost:{port}")
try:
server.run() # Expected failure
except OverflowError:
pass
# end section: gunicorn
proc = subprocess.run("exit 1", shell=True, capture_output=True, text=True)
assert proc.returncode == 1, f"Got returncode = {proc.returncode}"
Environment
OS: Ubuntu 20.04.6
$ python --version
Python 3.8.10
$ pip freeze | grep gunicorn
gunicorn==21.2.0
Attempts at finding the root cause
I suspect this might have something to do with the improper cleanup of the signal handling steps done in init_handlers of arbiter.py
I generated function call traces of the above MRE with and without the gunicorn section and did a simple vimdiff
python -m trace --trace main.py 2>/dev/null > trace.out
subprocess.run internally does a os.waitpid(pid, wait_flags) call. Here pid is supposed to be the process ID of the command it executes (exit 1 in the MRE case). However in the MRE scenario above, the pid passed to waitpid is the gunicorn worker pid, which no longer exists. os.waitpid raises a ChildProcessError and subprocess sets the exit code to 0 when handling the exception.
Maybe related issue: https://github.com/encode/uvicorn/issues/894
I think this is might be solved by #3148, at least with the change-set that I primarily propose. At I didn't get the assert when trying it out with that.