RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

SYCL HIP Compiler on Corona

Open rchen20 opened this issue 2 years ago • 9 comments

Instructions for building a working SYCL compiler on Corona:

  1. module load gcc/10.3.1-magic
  2. git clone https://github.com/intel/llvm -b sycl
  3. cd llvm
  4. srun -n1 /usr/bin/python3 buildbot/configure.py --hip -o buildrocm5.7.1 \ --cmake-gen "Unix Makefiles" \ --cmake-opt=-DSYCL_BUILD_PI_HIP_ROCM_DIR=/opt/rocm-5.7.1 \ --cmake-opt=-DSYCL_BUILD_PI_HIP_ROCM_INCLUDE_DIR=/opt/rocm-5.7.1/include \ --cmake-opt=-DSYCL_BUILD_PI_HIP_ROCM_LIB_DIR=/opt/rocm-5.7.1/lib \ --cmake-opt=-DSYCL_BUILD_PI_HIP_INCLUDE_DIR=/opt/rocm-5.7.1/include \ --cmake-opt=-DSYCL_BUILD_PI_HIP_HSA_INCLUDE_DIR=/opt/rocm-5.7.1/hsa/include/hsa \ --cmake-opt=-DSYCL_BUILD_PI_HIP_LIB_DIR=/opt/rocm-5.7.1/lib \ --cmake-opt=-DUR_HIP_ROCM_DIR=/opt/rocm-5.7.1 \ --cmake-opt=-DUR_HIP_INCLUDE_DIR=/opt/rocm-5.7.1/include \ --cmake-opt=-DUR_HIP_HSA_INCLUDE_DIR=/opt/rocm-5.7.1/hsa/include/hsa \ --cmake-opt=-DUR_HIP_LIB_DIR=/opt/rocm-5.7.1/lib
  5. srun -n1 /usr/bin/python3 buildbot/compile.py -o buildrocm5.7.1

Using the compiler:

A. Run the Corona script in the RAJA repo:

module load rocm/5.7.1
cd raja
./scripts/lc-builds/corona_sycl.sh /usr/workspace/raja-dev/clang_sycl_2f03ef85fee5_hip_gcc10.3.1_rocm5.7.1
cd {build directory}
make -j

Confirmed in rocprof that GPU is being used in RAJA test.

rchen20 avatar Apr 22 '22 02:04 rchen20

@homerdin @trws @rhornung67 @artv3 @davidbeckingsale Moving the discussion from email to the Github issue. The questions I posed in the email are here:

If we want to use this version of clang, shall I install one somewhere in a public directory? Is there interest in having the same SYCL compiler, but for a CUDA backend? I think Pascal is the platform for an Intel-CUDA configuration, but it has an old NVIDIA card and would require an updated ROCM.

rchen20 avatar Apr 22 '22 03:04 rchen20

@artv3 I've rebuilt and installed the latest version of the SYCL HIP compiler on corona in /usr/workspace/raja-dev/clang_sycl_hip_gcc10.2.1_rocm5.1.0/install. You can try the compiler out by following steps A and C in the top comment on this issue. Let me know if you run in to any problems.

rchen20 avatar Jun 15 '22 01:06 rchen20

@artv3 I've rebuilt and installed the latest version of the SYCL HIP compiler on corona in /usr/workspace/raja-dev/clang_sycl_hip_gcc10.2.1_rocm5.1.0/install. You can try the compiler out by following steps A and C in the top comment on this issue. Let me know if you run in to any problems.

I hit the following build errors:

clang-14: error: unknown argument: '-fsycl-unnamed-lambda'
clang-14: error: unknown argument: '-fsycl-targets=amdgcn-amd-amdhsa'

artv3 avatar Jun 20 '22 16:06 artv3

I hit the following build errors:

clang-14: error: unknown argument: '-fsycl-unnamed-lambda'
clang-14: error: unknown argument: '-fsycl-targets=amdgcn-amd-amdhsa'

Whoops, added group read & exe permissions to the compiler directory.

rchen20 avatar Jun 20 '22 17:06 rchen20

I hit the following build errors:

clang-14: error: unknown argument: '-fsycl-unnamed-lambda'
clang-14: error: unknown argument: '-fsycl-targets=amdgcn-amd-amdhsa'

Whoops, added group read & exe permissions to the compiler directory.

Permissions forall!

artv3 avatar Jun 20 '22 17:06 artv3

Notes to self:

  • Potential workaround to current SYCL compiler build issues with HIP https://github.com/intel/llvm/issues/11873
  • Need at least rocm/5.6.0 due to SYCL's usage of specific API calls (e.g. hipArray3DGetDescriptor)

rchen20 avatar Nov 30 '23 22:11 rchen20

Preserving the instructions from 2022 for posterity -

Instructions for building a working SYCL compiler on Corona:

  1. module load rocm/5.1.0
  2. module load gcc-tce/10.2.1
  3. module load mvapich2-tce/2.3.6
  4. git clone https://github.com/intel/llvm -b sycl
  5. cd llvm
  6. /usr/bin/python3 buildbot/configure.py --hip -o buildrocm5.1.0 --cmake-gen "Unix Makefiles" --cmake-opt="-DSYCL_BUILD_PI_HIP_INCLUDE_DIR=/opt/rocm-5.1.0/hip/include" --cmake-opt="-DSYCL_BUILD_PI_HIP_HSA_INCLUDE_DIR=/opt/rocm-5.1.0/hsa/include" --cmake-opt="-DSYCL_BUILD_PI_HIP_AMD_LIBRARY=/opt/rocm-5.1.0/lib/libamdhip64.so"
  7. /usr/bin/python3 buildbot/compile.py -o buildrocm5.1.0

Using the compiler:

A. Change paths to point to clang-sycl installation:

export PATH=/path/to/buildrocm5.1.0/install/bin:$PATH
export LD_LIBRARY_PATH=/path/to/buildrocm5.1.0/install/lib:$LD_LIBRARY_PATH

B. Create a softlink to lld in the install/bin directory (or set PATH to the new bin directory in the previous step):

ln -s /path/to/buildrocm5.1.0/bin/lld /path/to/buildrocm5.1.0/install/bin

C. Compile RAJA with corona_sycl script (https://github.com/LLNL/RAJA/pull/1254).

Caveat - Need to comment out __CUDA_ARCH__ lines in policy/atomic_auto.hpp. For some reason, the clang compiler is picking up __CUDA_ARCH__.

Also, apparently some nodes on Corona have non-functional GPUs, so some allocations will produce segfaulting runs.

rchen20 avatar Dec 05 '23 21:12 rchen20

Can we use a newer version of ROCm? 5.1 is old and we'll soon be moving to 6.0. Does 5.6 work?

rhornung67 avatar Dec 08 '23 21:12 rhornung67

Can we use a newer version of ROCm? 5.1 is old and we'll soon be moving to 6.0. Does 5.6 work?

Yes, if you look at the comment on top, I'm currently using rocm/5.6.0. The rocm/5.1.0 stuff is what I did last year, and I'm keeping those instructions just in case we need to refresh ourselves on how to build the compiler.

rchen20 avatar Dec 08 '23 21:12 rchen20