DCU:Cannot find GPU on this computer!
Details
I installed ABACUS version v3.9.0 on Ubuntu 24.04. My device has an AMD CPU and DCU. When I compiled the CPU version using the toolchain method, the program ran successfully. But I failed to compile the DCU-supported version with toolchain, so I switched to make. My make command was: CC=clang CXX=clang++ cmake -B build -S /opt/abacus-develop -DUSE_OPENMP=OFF -DENABLE_LCAO=OFF -DFFT3W_DIR=$INSTALL_DIR/fftw-3.3.10 -DLAPACK_DIR=$INSTALL_DIR/openblas-0.3.28/lib -DSCALAPACK_DIR=$INSTALL_DIR/scalapack-2.2.1/lib -DUSE_ROCM=ON -DROCM_PATH=$ROCM_PATH -DHIP_PATH=$HIP_PATH After successful compilation, when I changed DEVICE to GPU in the INPUT file, I got "Cannot find GPU on this computer!" How can I solve this? What should I do if I use the toolchain method? Or how to fix it with the make compilation approach?
Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html
- [x] Yes, I have read the FAQ part on online manual.
Task list for Issue attackers (only for developers)
- [ ] Understand the problem or question described by the user.
- [ ] Check if the issue is a known problem or has been addressed in the documentation.
- [ ] Test the issue or problem on a similar system or environment, if possible.
- [ ] Identify the root cause or provide clarification on the user's question.
- [ ] Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
- [ ] If the issue is related to documentation, update the documentation to prevent future confusion (optional).
- [ ] If the issue is related to code, consider implementing a fix or improvement (optional).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Ensure the user's issue is resolved or their question is answered and close the ticket.
Hi @1Keria Thanks for your issue !
For now, ABACUS toolchain have fully support only for CUDA-based installation, while HIP-based (like DCU) installation is not fully supported. For DCU compilation, you can still install requirement like which in cpu, and add DCU options during cmake -B part for cusolver support.
Also, I notice that you're using clang++ to build ABACUS, while AOCC-AOCL installation of ABACUS is not supported now, see #5982
Did community have tutorials for DCU installation now ? @mohanchen @dzzz2001
What's the version of your DCU Toolkit? We used to use DCU hardwares on Sugon computation platform, so we don't have an actual DCU hardware which may cause a bit difference between the two environment. You can provide your device and DCU toolkit info to help us handle this problem.
Using the latest DCU toolkit can directly compile CUDA version of ABACUS and it runs faster than DCU version. We can provide a toturial if you need.
Thank you for your help. I've rechecked and confirmed that the DCU version is dtk-24.04.3. My device has two DCUs, model Z100. If possible, could you provide a tutorial?
What's the version of your DCU Toolkit? We used to use DCU hardwares on Sugon computation platform, so we don't have an actual DCU hardware which may cause a bit difference between the two environment. You can provide your device and DCU toolkit info to help us handle this problem.
Using the latest DCU toolkit can directly compile CUDA version of ABACUS and it runs faster than DCU version. We can provide a toturial if you need.
I dont know whether this feature of DCU toolkit can be released publicly, so I just tell you how to do it under this issue.
First you need to find you DCU Toolkit folder. Under this folder you will see a cuda subfolder. Entering this cuda folder and do
source env.sh
Then enter your abacus-develop folder, and just compiler following the guide of the documentation. In short words, you need to add this while configuring
-DUSE_CUDA_ON_DCU=ON \
-DUSE_CUDA=ON \
-DCMAKE_CUDA_COMPILER={path to dtk}/dtk/24.04.2/cuda/bin/nvcc
And dont turn USE_ROCM on. This should be fine.
If you compile in this way, you dont need to use clang as compiler. You can use
CC=gcc CXX=mpic++
We temporarily dont support a toolchain to finish this work, because we really dont know whether this feature of DTK is vaild. But a toolchain for directly installing DCU version may be provided later. Thanks for your issue!
I dont know whether this feature of DCU toolkit can be released publicly, so I just tell you how to do it under this issue.
First you need to find you DCU Toolkit folder. Under this folder you will see a cuda subfolder. Entering this cuda folder and do
source env.shThen enter your abacus-develop folder, and just compiler following the guide of the documentation. In short words, you need to add this while configuring
-DUSE_CUDA_ON_DCU=ON \ -DUSE_CUDA=ON \ -DCMAKE_CUDA_COMPILER={path to dtk}/dtk/24.04.2/cuda/bin/nvccAnd dont turn USE_ROCM on. This should be fine.
If you compile in this way, you dont need to use clang as compiler. You can use
CC=gcc CXX=mpic++We temporarily dont support a toolchain to finish this work, because we really dont know whether this feature of DTK is vaild. But a toolchain for directly installing DCU version may be provided later. Thanks for your issue!
I followed your suggestion and modified the operation instruction to:source /opt/dtk/cuda/env.sh
CC=gcc CXX=mpic++
cmake -B dcu
-DCMAKE_INSTALL_PREFIX=/opt/abacus-develop
-DCMAKE_C_COMPILER=$CC
-DCMAKE_CXX_COMPILER=$CXX
-DCMAKE_CUDA_COMPILER=/opt/dtk/cuda/bin/nvcc
-DUSE_CUDA=ON
-DUSE_CUDA_ON_DCU=ON
-DUSE_ROCM=OFF
-DLAPACK_DIR=$LAPACK_DIR
-DSCALAPACK_DIR=$SCALAPACK_DIR
-DFFTW3_DIR=$FFTW3_DIR
-DELPA_DIR=$ELPA_DIR
-DCEREAL_INCLUDE_DIR=$CEREAL_INCLUDE_DIR
-DLibxc_DIR=$LIBXC_DIR
-DENABLE_LCAO=ON
-DENABLE_LIBXC=ON
-DUSE_OPENMP=ON
-DUSE_ELPA=ON
-DENABLE_RAPIDJSON=ON
-DRapidJSON_DIR=$RAPIDJSON_DIR
But I encountered a new error during compilation that I couldn't resolve:
-- Found CUDAToolkit: /opt/dtk/cuda/targets/x86_64-linux/include (found version "11.8.89") -- The CUDA compiler identification is unknown -- Detecting CUDA compiler ABI info CMake Error: Error required internal CMake variable not set, cmake may not be built correctly. Missing variable is: _CMAKE_CUDA_WHOLE_FLAG CMake Error at toolchain/install/cmake-3.30.0/share/cmake-3.30/Modules/CMakeDetermineCompilerABI.cmake:74 (try_compile): Failed to generate test project build system. Call Stack (most recent call first): toolchain/install/cmake-3.30.0/share/cmake-3.30/Modules/CMakeTestCUDACompiler.cmake:19 (CMAKE_DETERMINE_COMPILER_ABI) CMakeLists.txt:325 (enable_language)
Do you have any solutions? Or did I make a mistake in my operations?
https://mcresearch.github.io/abacus-user-guide/abacus-dcu.html
Will the above tutorial being helpful? @1Keria
@mohanchen I'd like to raise three questions about this tutorial
- Dose GPU-LCAO calculation really NOT supported by DCU ABACUS ? In this tutorial:
目前 GPU/DCU 版本的 ABACUS 仅支持 PW 基组的计算,因此 INPUT 文件中 basis_type 参数仅能设置为 pw。
But GPU-LCAO ABACUS is supported.
- Which
ks_solvercan be used in DCU-ABACUS calculation? It is not clear in this tutorial - The version of OpenBLAS mentioned in this tutorial is relatively old. More importantly, there is inconsistent version tag in the compilation example
tar -zxvf OpenBLAS-0.3.23.tar.gz
cd OpenBLAS-0.3.23
make USE_OPENMP=1 NO_AVX512=1 FC="gfortran -fPIC" CC="gcc -fPIC" -j8
mkdir build
make PREFIX=/work/home/your_username/OpenBLAS-0.3.21/build install
Which may cause confusion.
Will the above tutorial being helpful? @1Keria Thank you very much for your help. However, the method I tried first was based on this document. After successfully compiling with make, I encountered the error message: "Cannot find GPU on this computer."
OK, we will update the document soon.