poudriere icon indicating copy to clipboard operation
poudriere copied to clipboard

Protection from oversized logs

Open AMDmi3 opened this issue 11 years ago • 7 comments

I've just happened to build math/atlas in poudriere, and something bad had happened:

% du -Ah atlas-3.8.4_7,1.log
606G    atlas-3.8.4_7,1.log

however, thankfully:

% du -h atlas-3.8.4_7,1.log
7.9G    atlas-3.8.4_7,1.log

If you're curious,

...
gcc48 -I/wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/..//CONFIG/include  -O2 -pipe  -fstack-protector -Wl,-rpath=/usr/local/lib/gcc48 -fno-strict-aliasing -o xatlbench atlbench.o atlconf_misc.o
atlconf_misc.o: In function `CmndResults':
atlconf_misc.c:(.text+0xbbb): warning: warning: tmpnam() possibly used unsafely; consider using mkstemp()
/usr/bin/make -f Make.top time
./xatlbench -dc /wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/bin/INSTALL_LOG -dp /wrkdirs/usr/ports/math/atlas/work/ATLAS/shared/ARCHS/Core264SSE3
Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: Enter Clock rate in Mhz [0]: (continues forever)

Anyway, I see two problems here:

  • Log will grow and eat all available diskspace
  • Even if process is killed, bzcat process which, as I understand, greps log for errors reported by pkg and ports framework, takes forever. Which is worse, it's run multiple times, and which is even more worse, log happens to be sparse file, so it'll grep 600GB instaead of 8.

While port should be fixed of course (I'm thinking of adding IS_INTERACTIVE), it'd be nice to have basic protection against such problems as we have for builds taking too long. Apparent (to protect from big spare files) log size may be monitored by a separate process and build marked as failed if the log size exceeds some sane value like 100MB-1GB.

AMDmi3 avatar Mar 04 '15 19:03 AMDmi3

Wow that's crazy. I'm confused on your report though. How did it go from 606G to 7.9G? Why do you mention bzcat? We don't compress logs at all. bzgrep is used in processonelog.sh only because it was stolen from Tinderbox/Portbuild.

bdrewery avatar Apr 09 '15 18:04 bdrewery

Wow that's crazy. I'm confused on your report though. How did it go from 606G to 7.9G?

That was a typo (fixed). 606G is apparent size (which also ls -l reports), 7.9G is real size. I don't remember the details alrady, maybe it really was bzgrep. If you want I can rerun this build to gather more info.

AMDmi3 avatar Apr 09 '15 18:04 AMDmi3

I'm guessing this is on a ZFS dataset with compression enabled. By default the log dir uses lz4.

bdrewery avatar Apr 09 '15 18:04 bdrewery

Correct

AMDmi3 avatar Apr 09 '15 18:04 AMDmi3

We can add a check in nohang for a file that has exceeded 1GB and kill it as a runaway.

bdrewery avatar Jun 14 '17 16:06 bdrewery

Just had processonelog.sh run for hours on a particularly large log file (gcc-arm-embedded). Perhaps /usr/bin/timeout would be useful here? Maybe we could specify a timeout in poudriere.conf?

swills avatar Feb 02 '22 14:02 swills

There's now a way to disable log processing completely https://github.com/freebsd/poudriere/pull/953

AMDmi3 avatar Feb 05 '22 20:02 AMDmi3