hipFFT
hipFFT copied to clipboard
[Issue]: Building hipFFT on NVIDIA platform. [Perlmutter supercomputer]
Problem Description
I am trying to build hipfft/rocm-5.5.1 on NVIDIA A100 GPUs available on the Perlmutter supercomputer. I already have cuda/12.2 and the corresponding cuFFT in my path. There is also hipcc/5.5.1 that is configured with the said cuda version. Here is the CMake Command:
cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_LIB=CUDA -DCMAKE_INSTALL_PREFIX=$PWD/../install -L ../
The error
-- Found ROCm
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
By not providing "Findamd_comgr.cmake" in CMAKE_MODULE_PATH this project
has asked CMake to find a package configuration file provided by
"amd_comgr", but CMake did not find one.
Could not find a package configuration file provided by "amd_comgr" with
any of the following names:
amd_comgrConfig.cmake
amd_comgr-config.cmake
Add the installation prefix of "amd_comgr" to CMAKE_PREFIX_PATH or set
"amd_comgr_DIR" to a directory containing one of the above files. If
"amd_comgr" provides a separate development package or SDK, be sure it has
been installed.
Call Stack (most recent call first):
/global/common/software/nersc/pe/rocm/5.5.1/lib64/cmake/hip/hip-config.cmake:183 (find_dependency)
library/CMakeLists.txt:34 (find_package)
Operating System
SLES 15-SP4
CPU
AMD EPYC 7713 64-Core Processor
GPU
AMD Instinct MI250X
ROCm Version
ROCm 5.5.1
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
FYI - I did not find the right GPU option in the selection, so I selected randomly in order to be able to submit the issue.
If you set HIP_PLATFORM=nvidia in the environment, does that make a difference?
It's already set and it did not make any difference. It's usually set in our environment whenever hip-rocm modules are loaded.
Hmm, can you try commenting out the find_package(HIP REQUIRED) on library/CMakeLists.txt:34? Now that I look, it doesn't seem like it should be necessary.
@rgayatri23 Please try this: module purge module load cuda hip-cuda boost cmake fftw export HIP_PLATFORM=nvidia cmake -DROCM_DIR=<PATH_TO_HIPCUDA> -DCMAKE_MODULE_PATH=<PATH_TO_HIPCUDA>/hip/cmake/ -DCMAKE_CXX_COMPILER=hipcc -DHIP_ROOT_DIR=<PATH_TO_HIPCUDA> -DBUILD_WITH_LIB=CUDA -DBUILD_CLIENTS=ON -DCMAKE_CXX_FLAGS="-gencode=arch=compute_80,code=sm_80" ..
Thanks @af-ayala . This time the build went a bit ahead but got blocked on a different issue, so partial success! CMake is unable to find FFTW, even though its definitely in the path
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11.0")
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find FFTW (missing: FFTW_INCLUDE_DIRS FFTW_LIBRARIES) (Required
is at least version "3.0")
Call Stack (most recent call first):
/global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
clients/cmake/FindFFTW.cmake:103 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
clients/tests/CMakeLists.txt:26 (find_package)
-- Configuring incomplete, errors occurred!
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeOutput.log".
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeError.log".
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> echo $CPATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include:/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/include
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> ls /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/*cufft*
.rw-r--r-- 12k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufft.h
.rw-r--r-- 19k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftw.h
.rw-r--r-- 12k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftXt.h
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftmp:
.rw-r--r-- 4.1k root 29 Sep 2023 cudalibxt.h
.rw-r--r-- 12k root 29 Sep 2023 cufft.h
.rw-r--r-- 5.1k root 29 Sep 2023 cufftMp.h
.rw-r--r-- 19k root 29 Sep 2023 cufftw.h
.rw-r--r-- 12k root 29 Sep 2023 cufftXt.h
Did you build FFTW yourself, or are you using the SLES packages? The distro packages are easier to use since they include both single and double precision libraries.
The GPU softwares are all built through the distro packages.
If you just want to build the library, setting -DBUILD_CLIENTS=OFF will get you that. Sometimes using modules from supercomputers becomes tricky. To build our testing infrastructure with DBUILD_CLIENTS=ON, you indeed need the dependencies for which you're getting errors, I would suggest the following procedure that works for me on other clusters:
- Get modules you need, spider will tell you what do you need to load first, e.g 'ums/default': module spider fftw module load ums/default module purge module load boost googletest module load fftw/3.3.10
Even with the BUILD_CLIENTS=OFF, CMake is looking for cufft. Is there a CMake var to pass the path. I did everything from adding the path to CMAKE_PREFIX_PATH to passing it as CXX and linker flags but it looks like the path is not being picked up.
@rgayatri23 Can you please check if you are still seeing the issue with the latest ROCm 6.1.2? Thanks!
This has been stale for a while; closing for now. Feel free to re-open if there's still a problem!
Sure. Sorry about the delay. I am having issues building rocm/6.0 on the NVIDIA platform. I will test this again once that is done.