puppet-on-openwrt
puppet-on-openwrt copied to clipboard
puppet process not terminated when a puppet runs completes
I have disabled the puppet service and I am triggering puppet runs manually for now.
Even after the transaction is reported completed a ruby process remains:
root@OpenWrt:~# ps w|grep pup
1686 root 1492 S grep pup
root@OpenWrt:~#
root@OpenWrt:~# puppet agent --server=puppet --onetime --debug
/usr/lib/ruby/1.9/yaml.rb:56:in `<top (required)>':
It seems your ruby installation is missing psych (for YAML output).
To eliminate this warning, please install libyaml and reinstall your ruby.
Debug: Failed to load library 'selinux' for feature 'selinux'
Debug: Using settings: adding file resource 'confdir': 'File[/etc/puppet]{:path=>"/etc/puppet", :ensure=>:directory, :loglevel=>:debug, :links=>:follow, :backup=>false}'
Debug: Failed to load library 'shadow' for feature 'libshadow'
Debug: Failed to load library 'ldap' for feature 'ldap'
...
Debug: /File[/etc/puppet/state/last_run_summary.yaml]: Autorequiring File[/etc/puppet/state]
Debug: /File[/etc/puppet/state/last_run_report.yaml]: Autorequiring File[/etc/puppet/state]
Debug: /File[/etc/puppet/state/graphs]: Autorequiring File[/etc/puppet/state]
Debug: /File[/var/log/puppet]/ensure: created
Debug: /File[/var/run/puppet]/ensure: created
Debug: Finishing transaction 8656680
root@OpenWrt:~# date; ps w|grep pup; ls -ltr /etc/puppet/state/
Thu Apr 24 21:01:42 CEST 2014
1757 root 32900 S ruby /usr/bin/puppet agent --server=puppet --onetime --debug
1957 root 32900 S ruby /usr/bin/puppet agent --server=puppet --onetime --debug
1966 root 1492 S grep pup
drwxr-xr-x 2 root root 1024 Apr 20 17:21 graphs
-rw-rw---- 1 root root 27629 Apr 24 20:51 state.yaml
-rw-r--r-- 1 root root 432 Apr 24 20:51 last_run_summary.yaml
-rw-r----- 1 root root 21197 Apr 24 20:51 last_run_report.yaml
-rw-r--r-- 1 root root 4 Apr 24 21:00 agent_catalog_run.lock
-rw-r----- 1 root root 231 Apr 24 21:01 resources.txt
-rw-r----- 1 root root 44 Apr 24 21:01 classes.txt
The lock file is removed several minutes later but even then one process is still running:
root@OpenWrt:~# date; ps w|grep pup; ls -ltr /etc/puppet/state/
Thu Apr 24 21:07:35 CEST 2014
1957 root 32900 S ruby /usr/bin/puppet agent --server=puppet --onetime --debug
1997 root 1492 S grep pup
drwxr-xr-x 2 root root 1024 Apr 20 17:21 graphs
-rw-r----- 1 root root 231 Apr 24 21:01 resources.txt
-rw-r----- 1 root root 44 Apr 24 21:01 classes.txt
-rw-rw---- 1 root root 27629 Apr 24 21:06 state.yaml
-rw-r--r-- 1 root root 443 Apr 24 21:06 last_run_summary.yaml
-rw-r----- 1 root root 21206 Apr 24 21:06 last_run_report.yaml
Hmm. I'm not sure :(
Is there anything sensitive in the debug? If not, can you give me a full pastebin or gist?
Can you see if this happens with an empty catelog? Also does it happen when you just use something like:
puppet apply -e "notify { bla:}"
Lets see if we can narrow it down. Maybe an exec or something that gets orphaned?
Hi,
I may have a clue how this happens. It looks like puppet has severe problems on fork. Any exec may leak a puppet instance....
I have a WNDR3800 where this happens all the time... I'm combining the following workaround:
- Make retry and timeouts work (see https://tickets.puppetlabs.com/browse/PUP-2299 for details)
- Patch puppet/util.rb to sleep for 0.5 seconds before running the target code block
- Watching for puppet childs that run longer than 1 minute and auto-kill them with the following script (triggered by cron)
#!/bin/ash
OLDPIDS=$(cat /tmp/puppet.pids)
echo > /tmp/puppet.pids
PIDS=$(grep -H ruby /proc/[0-9]*/cmdline|sed 's:/proc/\([0-9]*\)/.*:\1:'|xargs echo)
for PID in $PIDS; do
PPID=$(cat /proc/$PID/status |grep PPid|sed 's:[^0-9]*\([0-9]\)[^0-9]*:\1:g')
if [ -d /proc/$PID ] && echo $PIDS | grep $PPID ; then
if echo $OLDPIDS | grep $PID; then
kill -9 $PID
else
echo $PID >> /tmp/puppet.pids
fi
fi
done
This makes it work ok-ish. I'm still getting zombies, but no full processes.
Those workaround do work somehow and I can 100% confirm that it is related to puppet/util.rb:safe_posix_fork (actual forking) and util/execution.rb:execute waiting for the pid to end (Process.waitpid2). The Process ruby doc has some hints (like setting WNOHANG), but I can't get it to work in a 100% reliable manner. I guess s.o. with better ruby knowledge could fix it??
That said it should be possible to reproduce with a
export CAT=$(for i in $(seq 20); do echo "exec {'c$i': command => '/sbin/uci get network.loopback.ifname', timeout => '10'}" ; done) ; puppet apply -e "$CAT"
This will cause 20 execs and should trigger the problem.
Gosh. I'm really not sure what to do about this.
As another workaround, does provider=>shell work better? Or does it just result in even more zombies?
I found provider => shell to be worse....
@solarkennedy can you reproduce the problem?
I can not reproduce the problem on non-openwrt machines (I've just clocked my laptop at 800MHz and everything worked just fine)
I'm currently guessing that it might be related to uclibc... Google reveales several historic problems related to uclibc and forks.
Testing that might take quite some time, though....
I've tried to reproduce this in a docker image, and failed, but that doesn't really mean much.
I can try to reproduce it on real hardware this weekend.
Even if I did, I'm not sure I could help actually solve this :(