tcpdump child_cleanup: reap as many child processes as possible

child_cleanup: reap as many child processes as possible

Open martinetd opened this issue 2 years ago • 10 comments

Hello, one of our customer reported zombie processes when running tcpdump under load with -C/-z and I've been able to reproduce. Here's a patch suggestion, but feel free to use something else -- waitpid is also mandated by posix so there should be no portability issue (same requirement as wait) but I've only tested this on linux.

Commit message below: Under load it's possible multiple children have been killed before we start processing the SIGCHILD signal, leaving zombie processes behind everytime we miss a process. Reap as many processes as possible instead of assuming one handler call = one process like we currently did.

Can be reproduced by running the following commands in parallel:

tcpdump -i lo -w /tmp/test -C 1 -z /usr/bin/true
iperf3 -s
iperf3 -c localhost

Jan 21 '22 01:01 martinetd

Hello,

sorry for being annoying, is there anything I could do to help getting this merged? I'm in no particular hurry, but I tend to forget the PRs I've opened if they don't get any reply in a while so prefer striking when the iron is hot :)

Thanks!

Feb 07 '22 09:02 martinetd

Hello, It's been a bit over a month so it's time for my next being-the-annoying-guy here.

Once again please tell me if there's anything I can do to help get this merged: this is a real bug that affected one of our customers.

It might not be a huge bug, but leaving zombies around consumes memory and there's no harm in reaping these properly.

Thank you.

Mar 17 '22 07:03 martinetd

@fxlb @infrastation sorry for the direct mention, would either of you be able to take a look eventually?

It's not a huge deal but I don't like keeping PRs in limbos, if there's anything I can do I'll be happy to update anything.

Thanks!

Apr 11 '22 11:04 martinetd

I don't think one test on Linux is enough. There should be tests at least on macOS, some *BSD, Windows. Is using /usr/bin/true relevant is a test?

Apr 13 '22 09:04 fxlb

Thanks!

I can setup some BSD to test in the next few days if that helps, but I don't have any macOS or windows around.

The condition to produce zombies is to have multiple children die fast enough so that multiple children deaths happen by the time tcpdump processes the sigchld (new signals aren't sent until we're done processing one); so it doesn't have to be true at all, what matters is producing lots of children quickly. Our user reproduced it with a normal log rotation script. As far as test goes though, just confirming the signal handler reaps children as normal (doesn't hang, doesn't leave zombies behind for normal usage with single child death per signal) is probably enough -- there's no reason the loop wouldn't work if that does. It looks like to me that some of the CIs here aren't just builds but also run tests -- I assume there's some rotation test in there so it probably at least doesn't hang on tested platforms... Hard to tell for zombies though.

Apr 13 '22 10:04 martinetd

"Child process" is the proper term here.

Apr 13 '22 10:04 infrastation

Sorry for the poor naming - I didn't realize. I've renamed the subject, and amended commit message.

I had a freeBSD VM laying around, and was able to easily confirm both the problem and solution:

from compiled master branch:

# ./tcpdump -i lo0 -w /tmp/trace -C 1 -z /usr/bin/true
# iperf3 -s
# iperf3 -c localhost
# ps -edfww
3322 u0  S+   0:06.77 `-- USER=root LOGNAME=root HOME=/root SHELL=/bin/csh BLOCKSIZE=K MAIL=/var/mail/root MM_CHARSET=UTF-8 LANG=C.UTF-8 PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin TERM=vt100 HOSTTYPE=FreeBSD VENDOR=amd OSTYPE=FreeBSD MACHTYPE=x86_64 SHLVL=1 PWD=/root/tcpdump GROUP=wheel HOST=freebsd EDITOR=vi PAGER=less ./tcpdump -i lo0 -w /tmp/trace -C 1 -z /usr/bin/true
3567 u0  Z<+  0:00.01   `--  <defunct>

After updating to refs/pull/972/merge, I couldn't reproduce anymore. I've confirmed pid growth so tcpdump forked and reaped its child processes successfully.

Apr 13 '22 12:04 martinetd

Hello,

it's been another couple of weeks, so I'm going to be annoying again.

I still don't have a windows, but the comment above the child_cleanup function explicitly states that On windows, we do not use a fork, so we do not care less about waiting a child processes to die with ifdefs that I read as the function isn't compiled on windows, so I don't think this particularly requires testing there.

It works on linux and freeBSD, both of which exhibit the problem without the patch and no longer do after patching.

Is there anything else you're waiting for?

Apr 29 '22 12:04 martinetd

Does anyone see a problem with this change?

May 14 '22 19:05 fxlb

Hello,

I had forgotten about this, there doesn't look like anyone is opposing so would it be possible to merge?

Thank you.

Aug 05 '22 07:08 martinetd

. . . ping?

Feb 03 '23 04:02 martinetd

I'm glad someone mentioned this issue. When I used - z to call the shell script I wrote when I generated more than 250MB per second (50MB without files), I encountered this problem. Even if I used kill - 9, I could not kill the zombie process. 300+zombie process have been accumulated in 4 hours. I used to think there was a problem with my script.

Apr 13 '23 07:04 yshcheng

zombie processes are already "dead", you cannot kill them again; tcpdump needs to tell the kernel they can be cleaned up, as this patch does by calling waitpid as many times as necessary. There really isn't any workaround to this, playing games to make sure only one child process terminates at a time before the signal handler has been called is just going to slow everything down... It's much simpler to rebuild with this patch.

Apr 13 '23 21:04 martinetd

That should work - it won't hang, due to the WNOHANG, and the loop will exit if there are no zombies hanging around to have their status picked up, as waitpid() would return -1 with ECHLD in that case.

Apr 13 '23 22:04 guyharris

This looks correct to me: multiple child processes exiting can result in only a single SIGCHLD, so looping with waitpid() to reap them all is correct.

Apr 13 '23 23:04 fenner

After the message on the mailing list on 13/04/2023, there is no argument against this PR. Let's wait a few more days. I propose to merge it next week.

Apr 16 '23 19:04 fxlb

Thank you!

Apr 19 '23 09:04 fxlb

Thanks!

Apr 19 '23 10:04 martinetd

tcpdump tcpdump copied to clipboard

child_cleanup: reap as many child processes as possible

tcpdump
tcpdump copied to clipboard