Orphaned processes are left when stopped by runit (runsvdir / runsv)
I am running stud under runit and I have a problem that occurs when there are lot of connections established. Trying to stop the processes leaves them orphaned (via sv stop). They are alive, serving existing connections and bound to the port. This prevents sv start from running new stud processes. They cannot start because they cannot listen on src port.
Everything works fine when there are no established connections. I can start and stop the service as many times as I want and get desired effect.
I can reproduce it easily:
$ ps axjf
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
28073 28074 28071 28071 ? -1 S 1000 0:00 \_ runsvdir /home/messem/service
28074 28083 28071 28071 ? -1 S 1000 0:00 \_ runsv stud-38081
28083 20594 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
20594 20595 28071 28071 ? -1 S 1000 0:01 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
20594 20596 28071 28071 ? -1 S 1000 0:01 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
20594 20597 28071 28071 ? -1 S 1000 0:01 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
20594 20598 28071 28071 ? -1 S 1000 0:01 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
$ sv stop ./stud-38081
ok: down: ./stud-38081: 1s, normally up
$ ps axjf
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
28073 28074 28071 28071 ? -1 S 1000 0:00 \_ runsvdir /home/messem/service
28074 28083 28071 28071 ? -1 S 1000 0:00 \_ runsv stud-38081
$ sv start ./stud-38081
ok: run: ./stud-38081: (pid 1500) 0s
$ ps axjf
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
28073 28074 28071 28071 ? -1 S 1000 0:00 \_ runsvdir /home/messem/service
28074 28083 28071 28071 ? -1 S 1000 0:00 \_ runsv stud-38081
28083 1500 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1500 1501 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1500 1502 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1500 1503 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1500 1504 28071 28071 ? -1 S 1000 0:00 \_ /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
## [Established 1000 connection with stud] ##
$ sv stop ./stud-38081
ok: down: ./stud-38081: 1s, normally up
$ ps axjf
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
28073 28074 28071 28071 ? -1 S 1000 0:00 \_ runsvdir /home/messem/service
28074 28083 28071 28071 ? -1 S 1000 0:00 \_ runsv stud-38081
1 20595 28071 28071 ? -1 S 1000 0:01 /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1 1501 28071 28071 ? -1 S 1000 0:01 /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1 1502 28071 28071 ? -1 S 1000 0:01 /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1 1503 28071 28071 ? -1 S 1000 0:01 /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
1 1504 28071 28071 ? -1 S 1000 0:01 /usr/bin/stud -s -n 4 -u messem -b messemhost 8081 -f messemhost 38081
As you can see in the last listing. Processes are no longer under runsv. Parent process with PID 1500 is gone but children are alive.
+1
I'm happy to accept a patch, but I don't use this particular manner of running stud, so I can't say much about what's going on here.
Stud is designed to kill children when it gets a TERM signal, and my testing confirms it does this correctly (linux/x64); does runit not send the parent TERM?
Maybe the problem comes from CONT signal:
Down. If the service is running, send it a TERM signal,
and then a CONT signal. If ./run exits, start ./finish if
it exists. After it stops, do not restart service.
Possible ?
Yeah, that might be it. I'll play around with sending CONT to the children and see what happens.
Any news ?
Nope, no news except for random people googling and wondering.