Protection from oversized logs
I've just happened to build math/atlas in poudriere, and something bad had happened:
% du -Ah atlas-3.8.4_7,1.log
606G atlas-3.8.4_7,1.log
however, thankfully:
% du -h atlas-3.8.4_7,1.log
7.9G atlas-3.8.4_7,1.log
If you're curious,
...
gcc48 -I/wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/..//CONFIG/include -O2 -pipe -fstack-protector -Wl,-rpath=/usr/local/lib/gcc48 -fno-strict-aliasing -o xatlbench atlbench.o atlconf_misc.o
atlconf_misc.o: In function `CmndResults':
atlconf_misc.c:(.text+0xbbb): warning: warning: tmpnam() possibly used unsafely; consider using mkstemp()
/usr/bin/make -f Make.top time
./xatlbench -dc /wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/bin/INSTALL_LOG -dp /wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/ARCHS/Core264SSE3
Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: (continues forever)
Anyway, I see two problems here:
- Log will grow and eat all available diskspace
- Even if process is killed, bzcat process which, as I understand, greps log for errors reported by pkg and ports framework, takes forever. Which is worse, it's run multiple times, and which is even more worse, log happens to be sparse file, so it'll grep 600GB instaead of 8.
While port should be fixed of course (I'm thinking of adding IS_INTERACTIVE), it'd be nice to have basic protection against such problems as we have for builds taking too long. Apparent (to protect from big spare files) log size may be monitored by a separate process and build marked as failed if the log size exceeds some sane value like 100MB-1GB.
Wow that's crazy. I'm confused on your report though. How did it go from 606G to 7.9G? Why do you mention bzcat? We don't compress logs at all. bzgrep is used in processonelog.sh only because it was stolen from Tinderbox/Portbuild.
Wow that's crazy. I'm confused on your report though. How did it go from 606G to 7.9G?
That was a typo (fixed). 606G is apparent size (which also ls -l reports), 7.9G is real size. I don't remember the details alrady, maybe it really was bzgrep. If you want I can rerun this build to gather more info.
I'm guessing this is on a ZFS dataset with compression enabled. By default the log dir uses lz4.
Correct
We can add a check in nohang for a file that has exceeded 1GB and kill it as a runaway.
Just had processonelog.sh run for hours on a particularly large log file (gcc-arm-embedded). Perhaps /usr/bin/timeout would be useful here? Maybe we could specify a timeout in poudriere.conf?
There's now a way to disable log processing completely https://github.com/freebsd/poudriere/pull/953