salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] service.running hangs for start/restart of daemons with poor fd hygiene

Open hblanks opened this issue 2 years ago • 0 comments

Description

Salt's service state as of 3005 still hangs when handling start or restart for some daemons. This defect was noted as recently as August 2022 (https://github.com/saltstack/salt/issues/56231#issuecomment-1222559491).

I've provided a test case using salt's underlying code paths which demonstrates at least the core of what's happening; it runs fine on both a Linux and FreeBSD machine. This seems to come up for people with some regularity especially around things like PostgreSQL, which has a somewhat peculiar way of starting up. It came up for my with rspamd.

In short, salt's service module relys on salt's standard command runner, which in turn just uses subprocess.Popen.Communicate() for subprocesses. That's great for one-off executions but a problem for daemonization, where salt can start trying to read from the subprocess's stdout before daemonization is complete (and presumably sometimes instead of calling wait()? in the process tree for service.running states I'd see salt waiting on a defunct process. . . which usually is what happens when you don't call wait()).

Anyway. I suspect any system that's not using systemd is going to have this problem for at least some daemons. The easiest fix is making the service module invoke cmd with a timeout, but you could also double down on your threading I guess and wait() on the child process while also trying to communicate(). . . I suppose.

So, I hope that's a help. It's a bit of a bummer to see the last mention of this just get closed after 2 years even as people are having the problem still. . . after the same auto-close happened earlier, too:

  • closed in 2020: https://github.com/saltstack/salt/issues/56231
  • also closed in 2020: https://github.com/saltstack/salt/issues/44848#issuecomment-624489416

See also:

  • An early characterization of what happens, including replies from Hatch: https://groups.google.com/g/salt-users/c/wymM8NrslNw
  • https://github.com/saltstack-formulas/postgres-formula/pull/264/files
  • https://bugs.python.org/issue23213

Setup

  • [x] VM (kvm)

Steps to Reproduce the behavior

Test case, using sources below:

CLOSE_FDS=1 time python3 ./salt_run.py  # works fine, exits immediately
CLOSE_FDs=0 time python3 ./salt_run.py  # hangs until the daemonized process exists

Or, try this on a FreeBSD VM with a daemon like rspamd or postgresql.

Expected behavior

Salt's service.running state does not hang indefinitely or until one runs service XXX restart for non-systemd machines and for services where daemonization doesn't work as expected.

Screenshots

N/A

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3005

Dependency Versions:
          cffi: 1.15.1
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.1
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.4
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.15.0
        pygit2: Not Installed
        Python: 3.9.16 (main, Dec 17 2022, 01:13:33)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 22.3.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: freebsd 13.1
        locale: utf-8
       machine: amd64
       release: 13.1-RELEASE
        system: FreeBSD
       version: FreeBSD 13.1

Additional context

Test case sources:

==> daemonize.py <==
#!/usr/bin/env python3

"""
Demonstrates a daemonizing process that either does or doesn't close
its parent's stdin/stdout/stderr before starting work.

CLOSE_FDS=1 ./daemonize.py  # close fds
CLOSE_FDS=1 ./daemonize.py  # don't close fds
"""

import os
import sys
import time

# Riffed from
# http://web.archive.org/web/20131017130434/http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
# but it's just the usual daemonization (fork, setsid, fork, maybe close
# fds, and start the work.)
def daemonize(close_fds):
    pid = os.fork()
    if pid > 0:
        sys.exit(0) # exit first parent

    # decouple from parent env
    os.setsid()

    pid = os.fork()
    if pid > 0:
        sys.exit(1) # exit second parent

    if close_fds:
        # We have to close both stdout & stderr b/c
        # cmdmod.py's retcode() aliases stderr=subprocess.STDOUT.
        sys.stdout.flush()
        sys.stderr.flush()
        fw = open('/dev/null', 'wb')
        os.dup2(fw.fileno(), sys.__stdout__.fileno())
        os.dup2(fw.fileno(), sys.__stderr__.fileno())

        # We don't seem to need to close stdin, though, in order for
        # salt_run.py to terminate.
        # fr = open('/dev/null', 'rb')
        # os.dup2(fr.fileno(), sys.__stdin__.fileno())

    print("sleeping for 5s", file=sys.stderr)
    try:
        time.sleep(5)  # pretend our daemon only ran for 5s
    except Exception as e:
        print("error: %s" % e, file=sys.stderr)
        raise
    print("done", file=sys.stderr)

if __name__ == '__main__':
    close_fds = int(os.environ.get('CLOSE_FDS', '0'))
    daemonize(close_fds)

==> salt_run.py <==
#!/usr/bin/env python

"""
Test that salt's timed_subprocess hangs when running a daemon process
that doesn't close off its stdin/stdout/stderr.

Usage:

CLOSE_FDS=1 ./salt_run.py  # works fine
CLOSE_FDS=0 ./salt_run.py  # hangs
"""

import subprocess
import sys

import salt.utils.timed_subprocess

proc = salt.utils.timed_subprocess.TimedProc(
    # use our executable since FreeBSD names python3 python39, etc.
    '%s ./daemonize.py' % sys.executable,
    shell=True,
    stdout=subprocess.PIPE, 
    stderr=subprocess.STDOUT)
proc.run()

print('done!')

hblanks avatar Dec 25 '22 21:12 hblanks