easybuild-framework
easybuild-framework copied to clipboard
Meaningful error output (and actual location of the log file on the line about log files)
The worst part of Easybuild, when it fails, is tracking down where things went wrong.
While rebuilding a LOONG toolchain, one ends up with something like this:
== configuring...
== ... (took 5 secs)
== building...
== ... (took 31 mins 50 secs)
== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/swmanage/GCCcore/11.2.0/system-system): build failed (first 300 chars): cmd " make -j 20 bootstrap " exited with exit code 2 and output:
echo stage3 > stage_final
make[1]: Entering directory `/dev/shm/swmanage/GCCcore/11.2.0/system-system/gcc-11.2.0/stage3_obj'
make[2]: Entering directory `/dev/shm/swmanage/GCCcore/11.2.0/system-system/gcc-11.2.0/stage3_obj'
make[2]: (took 32 mins 11 secs)
== Results of the build can be found in the log file(s)
ERROR: Build of /p/fastdata/zam/swmanage/EasyBuild/2022/Golden_Repo/g/GCCcore/GCCcore-11.2.0.eb failed (err: 'build failed (first 300 chars): cmd " make -j 20 bootstrap " exited with exit code 2 and output:\necho stage3 > stage_final\nmake[1]: Entering directory `/dev/shm/swmanage/GCCcore/11.2.0/system-system/gcc-11.2.0/stage3_obj\'\nmake[2]: Entering directory `/dev/shm/swmanage/GCCcore/11.2.0/system-system/gcc-11.2.0/stage3_obj\'\nmake[2]: ')
- Great, now where's the error?
No idea. The output itself is useless.
- Great, where is the log file?
No idea - it was written at the beginning, but it's not written here at the end. The buffer scrolled so much that there's no location of the logfile here anymore. Time to fish /tmp
again... Plus, the file at the beginning gets deleted, although the directory is kept.
Suggestions:
- Improve the error output here. This "first 300 chars" is nonsense.
- Tell where the logfile is, at the line which says:
== Results of the build can be found in the log file(s)
. - How about:
== Results of the build can be found in the log file at XXXXX
@surak I fully agree that we should do a lot better here. We should make an honest attempt to fish out the actual error message from the log, rather than only showing the first 300 characters, since that's indeed usually useless.
I didn't actually realize that the location of the log file isn't printed at the end at all, that's definitely an easy change to make.
For me it's not a problem because I have the use of eb --last-log
deeply engraved into my muscle memory (see also https://easybuilders.github.io/easybuild-tutorial/2021-isc21/troubleshooting/#last-log).
I have an ebl
alias for this to save more keystrokes:
alias ebl='vim $(eb --last-log | grep -v ">>")'
The grep -v
part is only there because I always have $EB_VERBOSE
set.
The Results ...
line should actually include the path to the application build log, see https://github.com/easybuilders/easybuild-framework/blob/0c1a642848681a71709c2dffefdc305a86cb789b/easybuild/framework/easyblock.py#L4186-L4187
For some reason it's not for you though? So the glob
that defines the logs
variable comes back with an empty list somehow?
Can you check the temporary directory created by EasyBuild, are the logs actually still there?
Also, can you share the output of eb --show-config
?
for me the log files shown at the beginning and at the end are not the same, which is a bit confusing. at the beginning:
== Temporary log file in case of crash /dev/shm/vsc10009/eb-e_w78y/easybuild-6J_zLt.log
at the end (after a crash):
== Results of the build can be found in the log file(s) /dev/shm/vsc10009/eb-e_w78y/easybuild-GROMACS-2021.4-20211222.085224.wZvkK.log
@smoors That's because there are two types of log files: a "session" log file, and a separate log file per installation being performed (which gets copied to the installation directory on completion); see also https://docs.easybuild.io/en/latest/Logfiles.html
The
Results ...
line should actually include the path to the application build log, seehttps://github.com/easybuilders/easybuild-framework/blob/0c1a642848681a71709c2dffefdc305a86cb789b/easybuild/framework/easyblock.py#L4186-L4187
For some reason it's not for you though? So the
glob
that defines thelogs
variable comes back with an empty list somehow?Can you check the temporary directory created by EasyBuild, are the logs actually still there?
Also, can you share the output of
eb --show-config
?
@surak Can you clarify?
I think that the issue description is clear enough, no? There's no pointer at the actual error, neither where to find it.
What I have been doing on the supercomputers for years is to run everything with -l
- as most errors tend to show close to the end, with a bit of a scroll at least.
On other environments, -l
seems to print much more than before, or so was my impression. When updating the timer, does it do some ascii commands on the timer line or does it print everything again?
In any case, if something fails, the last thing I want to see on the screen is either:
- what caused the error,
- where to look more.
Currently, I have neither.
== installing extension rgl 0.107.14 (386/1038)...
>> defining build environment for gcccoremkl/11.2.0-2021.4.0 toolchain
>> running command:
[started at: 2022-01-24 22:18:12]
[working dir: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl]
[output logged in /tmp/eb-uw4uv1em/easybuild-run_cmd-ytni5pln.log]
tar xzf /p/project/ccstao/cstao05/easybuild/juwelsbooster/sources/r/R/extensions/rgl_0.107.14.tar.gz
>> command completed: exit 0, ran in < 1s
>> running command:
[started at: 2022-01-24 22:18:12]
[working dir: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl]
[output logged in /tmp/eb-uw4uv1em/easybuild-run_cmd-eh_rwk__.log]
R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl
--library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error
>> command completed: exit 1, ran in 00h01m25s
== ... (took 1 min 26 secs)
== ... (took 3 hours 8 mins 51 secs)
== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0): build failed (first 300
chars): cmd " R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl
--library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error " exited with exit
code 1 and output:
* installing *source* pack (took 3 hours 36 mins 9 secs)
== Results of the build can be found in the log file(s) /tmp/eb-uw4uv1em/easybuild-R-4.1.2-20220124.184330.qfPqj.log
Installing 'rgl' extension (386/1038) ━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 3:08:15
ERROR: Build of /p/project/cjsc/a.strube/easybuild-repository/Golden_Repo/r/R/R-4.1.2-gcccoremkl-11.2.0-2021.4.0.eb failed (err: 'build failed (first 300 chars): cmd " R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl --library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error " exited with exit code 1 and output:\n* installing *source* pack')
It's like... what happened? I don't know. And the error log is not the last thing I see: Rather, it's completely useless info. I have to fish for the log file on the letter soup here.