pkg icon indicating copy to clipboard operation
pkg copied to clipboard

post-install script locks pkg forever

Open rbgarga opened this issue 4 years ago • 4 comments

This bug is happening on pfSense installations since we moved to pkg 1.13.x.

pfSense has extra modules that can be installed to add extra functionalities to the product. These extra modules are packages built from pfSense ports tree and are basically static files (PHP, xml, shell scripts) that are copied to final directories. These packages have pre-deinstall, deinstall and post-install actions using regular shell script post-install/post-deinstall. All described actions call a special script /etc/rc.packages $PKGNAME $ACTION.

rc.packages is a PHP script that reads all information about current package from its own XML files and execute expected actions, for example, restart daemons.

Since we moved to pkg 1.13.x we started to see the issue happening when users reinstall or upgrade one of these packages. At post-install execution it got stuck forever until pkg process is killed. I've added debug code on /etc/rc.packages and I can see it is executed and exit as expected.

When it happens, ps shows a defunct process as child of pkg, as you can see here:

root    82673  0.0  0.3 11352  2148  0  I+   11:15       0:01.67 | |     `-- truss /usr/local/sbin/pkg upgrade -fy pfSense-pkg-Quagga_OSPF-0.6.21_5
root    82808  0.0  1.9 38256 13796  0  IX+  11:15       0:01.46 | |       `-- /usr/local/sbin/pkg upgrade -fy pfSense-pkg-Quagga_OSPF-0.6.21_5
quagga    715  0.0  0.5 13424  3432  -  Is   11:15       0:00.01 | |         |-- /usr/local/sbin/zebra -d -f /var/etc/quagga/zebra.conf
quagga   1198  0.0  0.5 13836  3624  -  Ss   11:15       0:00.27 | |         |-- /usr/local/sbin/ospfd -d -f /var/etc/quagga/ospfd.conf
quagga   1856  0.0  0.6 15204  4536  -  Is   11:15       0:00.00 | |         |-- /usr/local/sbin/bgpd -d -f /var/etc/quagga/bgpd.conf
root    85976  0.0  0.0     0     0  0  Z+   11:15       0:00.11 | |         `-- <defunct>

According procstat this defunct process' binary is sh so I suspect it's the post-install script itself.

I ran this test with truss to try to collect some output. I'm attaching full result. Please let me know if there is anything I can do to collect more useful data. output.txt

rbgarga avatar Sep 28 '20 19:09 rbgarga

I tested pkg-devel 1.15.99.2 and confirmed the error is still present

rbgarga avatar Sep 29 '20 11:09 rbgarga

@evadot @bapt we managed to workaround this issue on pfSense with the change applied in this commit - https://github.com/pfsense/FreeBSD-ports/commit/839c8b7801d4716351ffeb2d1ce82f9dbb4f92fd

The condition to trigger the problem is when pd > 0 and it sets should_waitpid to false. Then when code reaches while (!feof(f) && !ferror(f) && getline(&line, &linecap, f) > 0) { it stays there forever. The last item we saw on truss output was a read(), maybe from inside getline() ?

Just to be clear, it never go inside that while() condition, we believe it get locked inside getline() call

rbgarga avatar Sep 29 '20 18:09 rbgarga

This should be fixed in 1.16, I rewrote that read/wait loop and added a regression test. https://github.com/freebsd/pkg/pull/1893

(Arguably, a package script shouldn't be restarting daemons or changing other system state outside the specific package. But appliances can make that awkward.)

cgull avatar Dec 04 '20 20:12 cgull

Arguably, a package script shouldn't be restarting daemons or changing other system state outside the specific package

I don't see how that relates to breaking basic scripting. The reaper code also made sure to break this intentionally a while back, but it is what it is. ;)

fichtner avatar Dec 05 '20 08:12 fichtner