lbann icon indicating copy to clipboard operation
lbann copied to clipboard

lbann failing to build on power9 architecture (lassen)

Open wderekjones opened this issue 4 years ago • 3 comments

While attempting to spack install lbann on lassen, I ran into an issue with installing the opencv and cudnn dependencies. Here is a snippet of the output when attempting to compile with gcc 7.2.1. I've also attempted other compilers but am not able to get a successful install to go through.

I run this command:

spack install lbann+gpu+nccl ^[email protected] 

and get this as output

==> Warning: Missing a source id for cnpy@master
==> Warning: Missing a source id for conduit@master
==> Warning: Missing a source id for [email protected]

.... 

==> 110524: Installing opencv
==> Using cached archive: /usr/WS1/jones289/spack/var/spack/cache/_source-cache/archive/9c/9ccb2192d7e8c03c58fee07051364d94ed7599363f3b0dce1c5e6cc11c1bb0ec.tar.gz
==> Staging archive: /var/tmp/jones289/spack-stage/spack-stage-opencv-4.2.0-xgzxklyy6krbgzn7y66bgm6kibyziwo2/4.2.0.tar.gz
==> Created stage in /var/tmp/jones289/spack-stage/spack-stage-opencv-4.2.0-xgzxklyy6krbgzn7y66bgm6kibyziwo2
==> No patches needed for opencv
==> 110524: opencv: Building opencv [CMakePackage]
==> 110524: opencv: Executing phase: 'cmake'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16'
....
See build log for details:
  /var/tmp/jones289/spack-stage/spack-stage-opencv-4.2.0-xgzxklyy6krbgzn7y66bgm6kibyziwo2/spack-build-out.txt
Traceback (most recent call last):
  File "/usr/WS1/jones289/spack/lib/spack/spack/build_environment.py", line 801, in child_process
    return_value = function()
  File "/usr/WS1/jones289/spack/lib/spack/spack/installer.py", line 1109, in build_process
    phase(pkg.spec, pkg.prefix)
  File "/usr/WS1/jones289/spack/lib/spack/spack/package.py", line 108, in phase_wrapper
    phase(spec, prefix)
  File "/usr/WS1/jones289/spack/lib/spack/spack/build_systems/cmake.py", line 248, in build
    inspect.getmodule(self).make(*self.build_targets)
  File "/usr/WS1/jones289/spack/lib/spack/spack/build_environment.py", line 131, in __call__
    return super(MakeExecutable, self).__call__(*args, **kwargs)
  File "/usr/WS1/jones289/spack/lib/spack/spack/util/executable.py", line 189, in __call__
    proc.returncode, long_msg)
spack.util.executable.ProcessError: Command exited with status 2:
    'make' '-j16'


then aluminum, hydrogen and ultimately lbann fail to build.

I truncated the output because there is so much. Let me know if you need additional details.

wderekjones avatar Mar 06 '20 22:03 wderekjones

It would be good to see why make failed. Can you send a complete log?

In nit-picky details: on lassen we recommend GCC 7.3.1; is there a reason you prefer the 7.2 series? IIRC, there was some problem with that compiler on Lassen.

benson31 avatar Mar 11 '20 16:03 benson31

Also, I appreciate your attempt to build using a 1-liner spackism (because it's not supposed to be a crazy-complicated thing). So I would very much like to debug the issue you're having.

However, I would be remiss if I didn't point you to the build documention: https://lbann.readthedocs.io/en/latest/building_lbann.html. This describes a set of scripts that can be used to finagle the build a bit to make use of externals and other LC idiosyncrasies. If you're just trying to get up and running, that might be a more reliable approach.

benson31 avatar Mar 11 '20 17:03 benson31

just seeing this. I will try the latest solution you propose. And no personally I don't have a preference towards compiler, I probably tried a few others as well...just whatever works is fine with me.

wderekjones avatar Mar 21 '20 00:03 wderekjones