qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

[WIP] Use HIP as language in CMake

Open quantumsteve opened this issue 3 years ago • 15 comments

Please review the developer documentation on the wiki of this project that contains help and requirements.

Proposed changes

Describe what this PR changes and why. If it closes an issue, link to it here with a supported keyword.

Fixes #3581

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes

Does this introduce a breaking change?

  • Yes (requires CMake 3.21 or later)

What systems has this change been tested on?

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • No. Documentation has been added (if appropriate)

quantumsteve avatar Dec 08 '21 20:12 quantumsteve

Can one of the admins verify this patch?

qmc-robot avatar Dec 08 '21 20:12 qmc-robot

@ye-luo I'm getting the following warnings when building and suspect it's related to CMAKE_HIP_ARCHITECTURES.

clang-13: warning: argument unused during compilation: '--offload-arch=gfx906' [-Wunused-command-line-argument]
clang-13: warning: argument unused during compilation: '--offload-arch=gfx908' [-Wunused-command-line-argument]

quantumsteve avatar Dec 08 '21 20:12 quantumsteve

Is this OK on spock?

prckent avatar Dec 08 '21 21:12 prckent

@quantumsteve could you paste your cmake line here? I mean on a regular workstation not spock.

I got a strange error

cmake -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DQMC_MPI=0 -DENABLE_CUDA=ON -DQMC_CUDA2HIP=ON ..
...
-- Check for working HIP compiler: /opt/rocm-4.5.0/llvm/bin/clang++
-- Check for working HIP compiler: /opt/rocm-4.5.0/llvm/bin/clang++ - broken
CMake Error at /home/packages/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cmake-3.21.4-t47xgwhz7ql3mvemzhzc7zwsrly6na3s/share/cmake-3.21/Modules/CMakeTestHIPCompiler.cmake:65 (message):
  The HIP compiler

    "/opt/rocm-4.5.0/llvm/bin/clang++"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/yeluo/opt/cleanup/qmcpack/build_R7_aompbuild_offload_cuda2hip_real_check/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_90a18/fast && /usr/bin/make  -f CMakeFiles/cmTC_90a18.dir/build.make CMakeFiles/cmTC_90a18.dir/build
    make[1]: Entering directory '/home/yeluo/opt/cleanup/qmcpack/build_R7_aompbuild_offload_cuda2hip_real_check/CMakeFiles/CMakeTmp'
    Building HIP object CMakeFiles/cmTC_90a18.dir/testHIPCompiler.hip.o
    /opt/rocm-4.5.0/llvm/bin/clang++ -D__HIP_ROCclr__=1 -I/opt/rocm-4.5.0/hip/include -I/opt/rocm-4.5.0/include -isystem /opt/rocm-4.5.0/hip/../include -isystem /opt/rocm-4.5.0/llvm/lib/clang/13.0.0 -fopenmp--cuda-host-only  --offload-arch=gfx906 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -o CMakeFiles/cmTC_90a18.dir/testHIPCompiler.hip.o -x hip -c /home/yeluo/opt/cleanup/qmcpack/build_R7_aompbuild_offload_cuda2hip_real_check/CMakeFiles/CMakeTmp/testHIPCompiler.hip
    clang-13: error: unknown argument: '-fopenmp--cuda-host-only'
    make[1]: *** [CMakeFiles/cmTC_90a18.dir/build.make:78: CMakeFiles/cmTC_90a18.dir/testHIPCompiler.hip.o] Error 1
    make[1]: Leaving directory '/home/yeluo/opt/cleanup/qmcpack/build_R7_aompbuild_offload_cuda2hip_real_check/CMakeFiles/CMakeTmp'
    make: *** [Makefile:127: cmTC_90a18/fast] Error 2

no idea where -fopenmp--cuda-host-only si from

ye-luo avatar Dec 08 '21 21:12 ye-luo

@ye-luo I had some trouble getting the spacing right between arguments in CMAKE_HIP_FLAGS. Expect it should be -fopenmp --cuda-host-only

quantumsteve avatar Dec 08 '21 22:12 quantumsteve

Last commit fixes that issue on Nitrogen. @prckent I haven't tried building on spock.

Using the same cmake configuration as GitHub Actions:

cmake -GNinja -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DQMC_MPI=0 -DENABLE_CUDA=ON -DQMC_CUDA2HIP=ON -DCMAKE_PREFIX_PATH=/opt/OpenBLAS/0.3.18 -DQMC_COMPLEX=0 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

quantumsteve avatar Dec 08 '21 22:12 quantumsteve

I'm getting some linking errors

ld.lld: error: undefined symbol: __tgt_target_data_update_mapper
>>> referenced by test_ompBLAS.cpp
>>>               CMakeFiles/test_omptarget_blas.dir/test_ompBLAS.cpp.o:(void qmcplusplus::test_gemv<float>(int, int, char))

This is on my desktop machine with the aomp 14.0 compiler and rocm 4.5.0. I did compile the aomp offload version successfully a few weeks ago.

markdewing avatar Dec 09 '21 21:12 markdewing

Nevermind ... I was confused about what this support was for. It's not for the offload version.

markdewing avatar Dec 09 '21 21:12 markdewing

Had no luck with gcc

cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DENABLE_CUDA=ON -DQMC_CUDA2HIP=ON -DQMC_MPI=OFF ..

it tries to link programs with clang instead of gcc. Got messages like

[100%] Linking HIP executable test_CUDA

which is expected host code.

ye-luo avatar Dec 09 '21 22:12 ye-luo

Had no luck with gcc

cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DENABLE_CUDA=ON -DQMC_CUDA2HIP=ON -DQMC_MPI=OFF ..

it tries to link programs with clang instead of gcc. Got messages like

[100%] Linking HIP executable test_CUDA

which is expected host code.

This test comment seems to suggest that the link language for mixed targets should always be HIP 😕 https://gitlab.kitware.com/cmake/cmake/-/blob/master/Tests/HIP/MixedLanguage/CMakeLists.txt

quantumsteve avatar Dec 10 '21 15:12 quantumsteve

HIP is only part of the program. Having HIP compiler (and its flags) to take over linking is just so wrong.

ye-luo avatar Dec 10 '21 15:12 ye-luo

HIP is only part of the program. Having HIP compiler (and its flags) to take over linking is just so wrong.

HIP, CXX, and CUDA have values of 90, 30, and 15. https://gitlab.kitware.com/cmake/cmake/-/blob/a6fa3fa136c291c36aefbc79b62995a63bf9107b/Modules/CMakeHIPCompiler.cmake.in#L27

quantumsteve avatar Dec 10 '21 17:12 quantumsteve

Unfortunately, I found the current ROCm 4.5 release is bad in many ways for adopting HIP as a language. I have to park this PR for the moment. https://github.com/RadeonOpenCompute/ROCm/issues/1259#issuecomment-991753189

ye-luo avatar Dec 11 '21 19:12 ye-luo

@ye-luo Thank you for reviewing this. I'll move onto other issues while we wait for a more usable ROCm release.

quantumsteve avatar Dec 11 '21 21:12 quantumsteve

temporarily close or mark this [WIP] it's clutter up the PR list.

PDoakORNL avatar Dec 17 '21 15:12 PDoakORNL

Test this please

ye-luo avatar Nov 17 '22 03:11 ye-luo

The latest clang 16 based AOMP 16.0-2 doesn't accept a clang_rt.builtins from rocm shipped clang 15. It seems that the cmake using hip as a language in rocm 5.3 picks up the clang_rt.builtins from the compiler rather than the rocm directory. This is actually the right thing to do because clang_rt.builtins is a compiler built-in library. So I think it is time to turn on HIP as a language in CMake. Thus @williamfgc requires an update to cmake 3.21 on the olcf CI.

ye-luo avatar Nov 17 '22 03:11 ye-luo

Test this please

ye-luo avatar Nov 28 '22 18:11 ye-luo

Test this please

ye-luo avatar Nov 28 '22 20:11 ye-luo

Test this please

ye-luo avatar Nov 28 '22 22:11 ye-luo

@quantumsteve @markdewing I think it is good now. We can use HIP as a cmake language. Since I did many changes. I should not approval my own changes. So please do another round of review.

ye-luo avatar Nov 28 '22 22:11 ye-luo

@prckent please review the documentation update. crusher recipe has been updated to use rocm 5.3.0

ye-luo avatar Nov 29 '22 20:11 ye-luo

@quantumsteve @markdewing I think it is good now. We can use HIP as a cmake language. Since I did many changes. I should not approval my own changes. So please do another round of review.

LGTM, though GitHub won't let me approve a PR I'm the original author of.

quantumsteve avatar Nov 30 '22 16:11 quantumsteve

Confirmed good with rocm 5.4.0 as well.

ye-luo avatar Nov 30 '22 23:11 ye-luo

Test this please

ye-luo avatar Nov 30 '22 23:11 ye-luo