pkg
pkg copied to clipboard
post-install script locks pkg forever
This bug is happening on pfSense installations since we moved to pkg 1.13.x.
pfSense has extra modules that can be installed to add extra functionalities to the product. These extra modules are packages built from pfSense ports tree and are basically static files (PHP, xml, shell scripts) that are copied to final directories. These packages have pre-deinstall, deinstall and post-install actions using regular shell script post-install/post-deinstall. All described actions call a special script /etc/rc.packages $PKGNAME $ACTION
.
rc.packages is a PHP script that reads all information about current package from its own XML files and execute expected actions, for example, restart daemons.
Since we moved to pkg 1.13.x we started to see the issue happening when users reinstall or upgrade one of these packages. At post-install execution it got stuck forever until pkg process is killed. I've added debug code on /etc/rc.packages and I can see it is executed and exit as expected.
When it happens, ps shows a defunct process as child of pkg, as you can see here:
root 82673 0.0 0.3 11352 2148 0 I+ 11:15 0:01.67 | | `-- truss /usr/local/sbin/pkg upgrade -fy pfSense-pkg-Quagga_OSPF-0.6.21_5
root 82808 0.0 1.9 38256 13796 0 IX+ 11:15 0:01.46 | | `-- /usr/local/sbin/pkg upgrade -fy pfSense-pkg-Quagga_OSPF-0.6.21_5
quagga 715 0.0 0.5 13424 3432 - Is 11:15 0:00.01 | | |-- /usr/local/sbin/zebra -d -f /var/etc/quagga/zebra.conf
quagga 1198 0.0 0.5 13836 3624 - Ss 11:15 0:00.27 | | |-- /usr/local/sbin/ospfd -d -f /var/etc/quagga/ospfd.conf
quagga 1856 0.0 0.6 15204 4536 - Is 11:15 0:00.00 | | |-- /usr/local/sbin/bgpd -d -f /var/etc/quagga/bgpd.conf
root 85976 0.0 0.0 0 0 0 Z+ 11:15 0:00.11 | | `-- <defunct>
According procstat
this defunct process' binary is sh
so I suspect it's the post-install script itself.
I ran this test with truss to try to collect some output. I'm attaching full result. Please let me know if there is anything I can do to collect more useful data. output.txt
I tested pkg-devel 1.15.99.2 and confirmed the error is still present
@evadot @bapt we managed to workaround this issue on pfSense with the change applied in this commit - https://github.com/pfsense/FreeBSD-ports/commit/839c8b7801d4716351ffeb2d1ce82f9dbb4f92fd
The condition to trigger the problem is when pd > 0
and it sets should_waitpid
to false. Then when code reaches while (!feof(f) && !ferror(f) && getline(&line, &linecap, f) > 0) {
it stays there forever. The last item we saw on truss output was a read()
, maybe from inside getline()
?
Just to be clear, it never go inside that while()
condition, we believe it get locked inside getline()
call
This should be fixed in 1.16, I rewrote that read/wait loop and added a regression test. https://github.com/freebsd/pkg/pull/1893
(Arguably, a package script shouldn't be restarting daemons or changing other system state outside the specific package. But appliances can make that awkward.)
Arguably, a package script shouldn't be restarting daemons or changing other system state outside the specific package
I don't see how that relates to breaking basic scripting. The reaper code also made sure to break this intentionally a while back, but it is what it is. ;)