MPI.jl icon indicating copy to clipboard operation
MPI.jl copied to clipboard

MPIPreferences fails with vendor="cray"

Open omlins opened this issue 1 year ago • 2 comments

Error message:

julia> using MPIPreferences; MPIPreferences.use_system_binary(mpiexec="srun", vendor="cray")
Error:
Compiling x86_64 targets is not supported on aarch64 hosts.

ERROR: failed process: Process(`cc --cray-print-opts=all`, ProcessExited(255)) [255]

Stacktrace:
 [1] pipeline_error
   @ ./process.jl:565 [inlined]
 [2] read(cmd::Cmd)
   @ Base ./process.jl:449
 [3] read
   @ ./process.jl:458 [inlined]
 [4] readchomp
   @ ./io.jl:974 [inlined]
 [5] analyze_cray_cc()
   @ MPIPreferences.CrayParser /capstor/scratch/cscs/omlins/julia_local/julia_depot/packages/MPIPreferences/PLH7x/src/parse_cray_cc.jl:67
 [6] use_system_binary(; library_names::Vector{…}, extra_paths::Vector{…}, mpiexec::String, abi::Nothing, vendor::String, export_prefs::Bool, force::Bool)
   @ MPIPreferences /capstor/scratch/cscs/omlins/julia_local/julia_depot/packages/MPIPreferences/PLH7x/src/MPIPreferences.jl:180
 [7] top-level scope
   @ REPL[1]:1
Some type information was truncated. Use `show(err)` to see complete types.


Loaded modules:

[todi][omlins@nid007359 codes]$ module list

Currently Loaded Modules:
  1) craype-x86-rome                        8) cudatoolkit/23.9_12.2
  2) libfabric/1.15.2.0                     9) gcc-native/12.3
  3) craype-network-ofi                    10) craype/2.7.30
  4) xpmem/2.8.2-1.0_3.7__g84a27a5.shasta  11) cray-mpich/8.1.28
  5) perftools-base/23.12.0                12) cray-libsci/23.12.5
  6) cpe/23.12                             13) PrgEnv-gnu/8.5.0
  7) cray/23.12

 

omlins avatar Jul 31 '24 16:07 omlins

progress report from Slack: adding -target-accel=nvidia90 -target-cpu=aarch64 makes the cray compiler wrapper behave -- now just thinking about how to best set this. It's a bit of a chicken + egg thing: need to know the accel type to get the compiler wrapper to tell you what the accelerator's gtl library is called

urgh....

CRRRRRRAAAAAAAAAAYYYYYY!!!!

JBlaschke avatar Jul 31 '24 18:07 JBlaschke

Ok maybe not all Cray's fault -- the problem is that the compiler wrappers are looking for theCRAY_ACCEL_TARGET and CRAY_CPU_TARGET env vars -- which are normally set. Just not on Alps at the moment. So I forgot about these. Most sites provide a module (craype-accel-nvidia and craype-accel-nvidia80 on Perlmutter) -- and often load it by default.

Leaving this issue open to remind me to write some env checks for those vars, and if they are not set, present the user with a sensible error message.

JBlaschke avatar Jul 31 '24 21:07 JBlaschke