RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

OpenMP Linking Error with Clang/16.0.6 + CUDA/11.2.0

Open rchen20 opened this issue 1 year ago • 9 comments

Strange OpenMP linking error with the next set of default compilers on BlueOS (clang/16.0.6, cuda/11.20, gcc/8.3.1). Will ping Gyllenhaal about this, but maybe @trws has seen this before?

I've tried manually including several other OpenMP paths, but keep getting the same error. The error disappears when I turn off CUDA compilation in RAJA (but leave OpenMP on). Also tried this with the latest Camp and BLT, but same result.

[  0%] Building CUDA object CMakeFiles/RAJA.dir/src/LockFreeIndexSetBuilders.cpp.o
/usr/tce/packages/cuda/cuda-11.2.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/tce/packages/clang/clang-ibm-16.0.6-cuda-11.2.0-gcc-8.3.1/bin/clang++  -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/include -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/build_lc_blueos-nvcc11.2.0-70-clangibm-16.0.6-cuda-11.2.0-gcc-8.3.1/include -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/tpl/camp/include -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/build_lc_blueos-nvcc11.2.0-70-clangibm-16.0.6-cuda-11.2.0-gcc-8.3.1/tpl/camp/include -isystem=/usr/tce/packages/cuda/cuda-11.2.0/include -restrict --expt-extended-lambda --expt-relaxed-constexpr -Xcudafe "--display_error_number" -O3 -Xcompiler -O3 -Xcompiler -fopenmp --generate-code=arch=compute_70,code=[compute_70,sm_70] -Xcompiler=-fPIC -Xcompiler=-fopenmp=libomp -std=c++14 -MD -MT CMakeFiles/RAJA.dir/src/LockFreeIndexSetBuilders.cpp.o -MF CMakeFiles/RAJA.dir/src/LockFreeIndexSetBuilders.cpp.o.d -x cu -c /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/src/LockFreeIndexSetBuilders.cpp -o CMakeFiles/RAJA.dir/src/LockFreeIndexSetBuilders.cpp.o
/usr/tce/packages/clang/clang-ibm-16.0.6/release/lib/clang/16/include/omp.h(504): error: linkage specification is incompatible with previous "omp_is_initial_device"
(134): here

1 error detected in the compilation of "/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/src/LockFreeIndexSetBuilders.cpp".

rchen20 avatar Feb 07 '24 01:02 rchen20

First thing here, it isn't a linker error. This is an error caused by the C++ linkage specification being different between two different declarations in the same translation unit while compiling. Normally with clang that would print the source locations of both declarations. My guess would be the situation is two different headers, one that has int omp_is_initial_device(); and one that has inline int omp_is_initial_device(); (inline requires different linkage, hence the error) or similar. If you can repro this with just clang and not nvcc, you'll get a much better error message.

trws avatar Feb 07 '24 01:02 trws

Mike Collette recommended adding the -fopenmp-version=45 flag to the link line, and it resolved the error. I recall needing to do this for Fortran+CUDA compilation, but I didn't think we'd need to do this for RAJA . . .

rchen20 avatar Feb 07 '24 02:02 rchen20

@rchen20 if the issue is resolved with the link line flag, please put up a PR with that change applied to the appropriate build script and/or host-config file.

rhornung67 avatar Feb 07 '24 16:02 rhornung67

@rchen20 if the issue is resolved with the link line flag, please put up a PR with that change applied to the appropriate build script and/or host-config file.

The flag does resolve the compilation problem, but I don't think it should be a flag we normally use. We use this flag in Teton to make sure the Fortran and CUDA compilers both agree on a version of the OpenMP standard to use. I wouldn't expect RAJA to require this flag as well. I'm happy to apply this band-aid, but it would be good if maybe @trws weighed in?

rchen20 avatar Feb 07 '24 17:02 rchen20

Does it work with regular (i.e., non-IBM clang) without the flag?

rhornung67 avatar Feb 07 '24 17:02 rhornung67

Strange, I get the same error with non-IBM clang, and adding the flag solves the problem but begets another set of template errors:

[  1%] Building CXX object blt/thirdparty_builtin/googletest/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
cd /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/build_lc_blueos-nvcc11.2.0-70-clang16.0.6/blt/thirdparty_builtin/googletest/googletest && /usr/tce/packages/clang/clang-16.0.6/bin/clang++ -DGTEST_HAS_DEATH_TEST=1 -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdpa
rty_builtin/googletest/googletest/include -I/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest -Wall -Wextra      -O3 -march=native -funroll-loops -finline-functions -fPIC -Wall -Wshadow -Wconversion -Wundef -DGTEST_HAS_PTHREAD=1 -fexceptions -W -Wpoi
nter-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wredundant-decls -std=c++14 -MD -MT blt/thirdparty_builtin/googletest/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o -MF CMakeFiles/gtest.dir/src/gtest-all.cc.o.d -o CMakeFiles/gtest.d
ir/src/gtest-all.cc.o -c /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/src/gtest-all.cc
clang-16: warning: argument unused during compilation: '-march=native' [-Wunused-command-line-argument]
In file included from /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/src/gtest-all.cc:38:
In file included from /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/include/gtest/gtest.h:65:
In file included from /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/include/gtest/gtest-death-test.h:43:
In file included from /usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/include/gtest/internal/gtest-death-test-internal.h:47:
/usr/workspace/wsrzc/chen59/allraja/rajablt/raja_git_desulompfix/blt/thirdparty_builtin/googletest/googletest/include/gtest/gtest-matchers.h:398:49: error: 'M' does not refer to a value
           std::is_trivially_copy_constructible<M>::value &&
                                                ^

On the other hand, IBM clang compiles successfully with the flag, and passes all the tests (but the OpenMP tests are 2-4x slower than usual).

rchen20 avatar Feb 07 '24 17:02 rchen20

That compile command doesn't include the flag, what are you testing here? My first guess as to the cause is maybe nvcc is reading the wrong OpenMP headers, the gcc libgomp headers rather than the clang libomp headers, and clang is loading its own. Then again if you use xl or clang-ibm then they are set up to use lomp, the IBM runtime, which has yet another set of headers. If in any of these nvcc gets a different set of headers in pre-processing than clang or xl does, it will likely break either in compilation, because the declarations aren't portable, or linking when the symbols don't match up.

The specific issue you just posted @rchen20 has nothing whatever to do with OpenMP though. It's a template parameter that somehow isn't getting handled right, which shouldn't be in any way related to this. Does the google test build work with this compiler normally?

trws avatar Feb 07 '24 18:02 trws

After trying again with the latest RAJA/develop, I'm getting the same error linkage specification error as before, with the -fopenmp-version=45 being a successful hack to get around this. I think Tom is correct in that perhaps nvcc and clang disagree on how omp_is_initial_device is decorated with inline in their respective versions of the headers.

On the other hand, building and testing with clang/ibm-16.0.6-cuda-11.8.0-gcc-11.2.1 works. I'll check with Gyllenhaal on whether this will be (hopefully) the new default on BlueOS, and maybe we won't need to worry about the faulty clang/ibm-16.0.6-cuda-11.2.0-gcc-8.3.1.

rchen20 avatar Mar 08 '24 19:03 rchen20

Thanks for looking into this and the updated info. When you find out what the defaults will be, let us know and we will add those configurations to our CI testing.

rhornung67 avatar Mar 08 '24 19:03 rhornung67

Closing this because it was solved by Roy Musselman and John Gyllenhaal, and I've tested this with the latest RAJA.

From Roy:

I've modified the clang omp.h file by removing the static keyword.  I've updated clang-16.0.6 and clang-ibm-16.0.6 on rzansel. I successfully ran a simple reproducer.

rchen20 avatar Mar 12 '24 20:03 rchen20

Am I reading this right, Roy seriously broke the linkage specification for omp_is_initial_device in the header? Does it work with pure clang builds after this? If so we might get away with it but the description for this fix is deeply concerning.

trws avatar Mar 12 '24 23:03 trws

Am I reading this right, Roy seriously broke the linkage specification for omp_is_initial_device in the header? Does it work with pure clang builds after this? If so we might get away with it but the description for this fix is deeply concerning.

Hmm, I'm not sure how John tested it, but Roy tested it with clang and clang-ibm which both worked.

rchen20 avatar Mar 12 '24 23:03 rchen20

Ok, since this is specifically a lassen problem, lets not worry about it unless something comes up.

trws avatar Mar 13 '24 00:03 trws