scorpio icon indicating copy to clipboard operation
scorpio copied to clipboard

Cray Clang linker error on Frontier when building SCORPIO with openmp flag and ADIOS2 lib

Open dqwu opened this issue 2 years ago • 2 comments

This issue was initially reproduced on Frontier when building a scream ne1024 F case with ADIOS support.

machine: frontier-scream-gpu
compiler: crayclang-scream
LND_NTHRDS: set to a value larger than 1
modules loaded: craype-accel-amd-gfx90a rocm/5.4.0 and others
CMAKE_CXX_FLAGS passed to SCORPIO: -fopenmp and others
WITH_ADIOS2 passed to SCORPIO: ON

The linker command fails on spio_finfo.exe One possible workaround: turn off CMake option PIO_ENABLE_TOOLS

This issue can also be reproduced on Frontier with a standalone SCORPIO build.

module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
git checkout a8d5e37

mkdir build
cd build

ADIOS2_DIR=/ccs/proj/cli115/software/adios/adios2-2.8.3-pr3345/crayclang/15.0.0 \
CC=cc CXX=CC FC=ftn \
cmake -Wno-dev \
-DWITH_ADIOS2=ON \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
-DCMAKE_CXX_FLAGS="-fopenmp" \
-DPIO_USE_MALLOC=ON \
..

make

Linker error

/opt/cray/pe/cce/15.0.0/cce-clang/x86_64/bin/llvm-link: error: linked module is broken!
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [tools/spio_finfo/CMakeFiles/spio_finfo.exe.dir/build.make:209: tools/spio_finfo/spio_finfo.exe] Error 1

Note, this issue is not reproducible if we configure and build ADIOS2 lib with the following settings: [modules] load craype-accel-amd-gfx90a and rocm/5.4.0 [CXXFLAGS]: add -fopenmp flag

dqwu avatar Jul 11 '23 16:07 dqwu

As a workaround @dqwu has rebuilt ADIOS using the following modules/flags,

[modules] load craype-accel-amd-gfx90a and rocm/5.4.0 [CXXFLAGS]: add -fopenmp flag

jayeshkrishna avatar Sep 07 '23 18:09 jayeshkrishna

@jayeshkrishna Recently, SCREAM developers made some changes for machine frontier-scream-gpu and compiler crayclang-scream:

  • Uses rocm/5.1.0 instead of rocm/5.4.0
  • Uses mpicc/mpicxx/ftn as compiler wrappers instead of original cc/CC/ftn
  • Overrides mpicxx to use hipcc with MPICH_CXX=/opt/rocm-5.1.0/bin/hipcc

Accordingly, ADIOS2 libs on Frontier need to be rebuilt with the same settings.

Note, we also need to override mpicc with MPICH_CC to rebuild ADIOS2. Otherwise, there are confirmed linking errors when building SCORPIO for SCREAM:

ld.lld: error: undefined symbol: __cray_sset_detect
>>> referenced by cm_util.c:65 (ADIOS2-2.9.1/thirdparty/EVPath/EVPath/cm_util.c:65)
>>>               cm_util.c.o:(CMtrace_init) in archive adios2/2.9.1/cray-mpich-8.1.26/crayclang-scream-14.0.0/lib64/libadios2_evpath.a
>>> referenced by evp.c:1057 (ADIOS2-2.9.1/thirdparty/EVPath/EVPath/evp.c:1057)

Updated workaround to rebuild ADIOS2 lib on Frontier for SCREAM: [modules] load craype-accel-amd-gfx90a and rocm/5.1.0 [wrappers] mpicc/mpicxx/ftn [C compiler]: set MPICH_CC=/opt/rocm-5.1.0/bin/hipcc [C++ compiler]: set MPICH_CXX=/opt/rocm-5.1.0/bin/hipcc

dqwu avatar Feb 01 '24 18:02 dqwu

Update: It appears that the workaround is no longer needed to build ADIOS2 2.10.1 on Frontier for SCREAM. Therefore, this issue can be closed for now.

dqwu avatar Nov 14 '24 19:11 dqwu