Cytnx icon indicating copy to clipboard operation
Cytnx copied to clipboard

CUDA, CUTENSOR, CUQUANTUM

Open pcchen opened this issue 9 months ago • 12 comments

It seems that if I turn cuda option on via -DUSE_CUDA=ON, then one automatically turns on cutensor and cuquantum. If this is the case, then maybe we don't need -DUSE_CUTENSOR=ON and -DUSE_CUQUANTUM=ON?

pcchen avatar Mar 15 '25 16:03 pcchen

Are you discussing the presets created in #579?

With the current settings (without #579), enabling CUDA will not automatically enable cuTENSOR and cuQuantum.

In #579, CMakeLists.txt follows the current settings. cuTENSOR and cuQuantum are not enabled when configuring with the command cmake -DUSE_CUDA. cuTENSOR and cuQuantum are only enabled with CUDA when using presets. cmake --preset openblas-cuda enables cuTENSOR and cuQuantum.

I updated the PR message of #579.

IvanaGyro avatar Mar 15 '25 17:03 IvanaGyro

Question 1:

Can one build gpu support without using CUTENSOR and CUQUANTUM?

pcchen avatar Mar 16 '25 02:03 pcchen

With commit 46dc390f2d5809ed070424dd24da8f5aa8b6cdf0 I assume this is the current settings.

If I do cmake -DUSE_CUDA=ON, it seems to automatically turns on the CUTENSOR/CUQUANTUM:

-- ------------------------------------------------------------------------
--   Project Cytnx, A Cross-section of Python & C++,Tensor network library 
-- ------------------------------------------------------------------------
-- 
-- /home/pcchen/github/Cytnx/cmake/Modules
--  Generator: Unix Makefiles
--  Build Target: -
--  Installation Prefix: 
--  Version: 1.0.0
-- The CXX compiler identification is GNU 13.3.0
-- The C compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CUDA compiler identification is NVIDIA 12.8.93
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")  
-- backend = cytnx
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl  
-- LAPACK found: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93") 
-- Looking for cuTENSOR in /home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive
--  cudaver: 12
-- ok
-- Build with CuTensor: YES
-- CuTensor: libdir:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12 incdir:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/include libs:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12/libcutensor.so;/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12/libcutensorMg.so
-- Looking for cuTENSOR in /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive
--  cudaver: 12
-- ok
-- Build with CuQuantum: YES
-- CuQuantum: libdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib incdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/include libs:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcutensornet.so;/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcustatevec.so
--  Build CUDA Support: YES
--   - CUDA Version: 
--   - CUDA Toolkit Root: 
--   - Internal macro switch: GPU/CUDA
--   - Cudatoolkit include dir: /usr/local/cuda/targets/x86_64-linux/include
--   - Cudatoolkit lib dir: /usr/local/cuda/lib64
--   - CuSolver library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusolver.so
--   - Curand library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcurand.so
--   - CuBlas library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcublas.so
--   - Cuda rt library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a -ldl
--   - Cuda devrt library:  -lrt -lcudadevrt
--   - Cuda cusparse library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusparse.so
--  Build OMP Support: NO
-- Found Python: /usr/bin/python3 (found version "3.12.3") found components: Interpreter Development Development.Module Development.Embed 
-- Found pybind11: /usr/include (found version "2.11.1")
-- pybind11 include dir: /home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12
-- pybind11 include dir: /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib
--  Build Python Wrapper: YES
--   - Python Excutable  : 
--   - Python Headers    : 
--   - Python Library    : 
--  Build Documentation: NO
-- |= Final FLAGS infomation for install >>>>> 
--     CXX Compiler: /usr/bin/c++
--     CXX Flags: 
--     BLAS and LAPACK Libraries: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
--     Link libraries: 
-- 
-- 
-- 
-- Configuring done (2.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pcchen/github/Cytnx/xxx

pcchen avatar Mar 16 '25 02:03 pcchen

I have to explicitly turn off the cutensor/cuquantum. cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF ..

-- ------------------------------------------------------------------------
--   Project Cytnx, A Cross-section of Python & C++,Tensor network library 
-- ------------------------------------------------------------------------
-- 
-- /home/pcchen/github/Cytnx/cmake/Modules
--  Generator: Unix Makefiles
--  Build Target: -
--  Installation Prefix: 
--  Version: 1.0.0
-- The CXX compiler identification is GNU 13.3.0
-- The C compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CUDA compiler identification is NVIDIA 12.8.93
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")  
-- backend = cytnx
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl  
-- LAPACK found: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93") 
--  Build CUDA Support: YES
--   - CUDA Version: 
--   - CUDA Toolkit Root: 
--   - Internal macro switch: GPU/CUDA
--   - Cudatoolkit include dir: /usr/local/cuda/targets/x86_64-linux/include
--   - Cudatoolkit lib dir: /usr/local/cuda/lib64
--   - CuSolver library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusolver.so
--   - Curand library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcurand.so
--   - CuBlas library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcublas.so
--   - Cuda rt library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a -ldl
--   - Cuda devrt library:  -lrt -lcudadevrt
--   - Cuda cusparse library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusparse.so
--  Build OMP Support: NO
-- Found Python: /usr/bin/python3 (found version "3.12.3") found components: Interpreter Development Development.Module Development.Embed 
-- Found pybind11: /usr/include (found version "2.11.1")
-- pybind11 include dir: 
-- pybind11 include dir: 
--  Build Python Wrapper: YES
--   - Python Excutable  : 
--   - Python Headers    : 
--   - Python Library    : 
--  Build Documentation: NO
-- |= Final FLAGS infomation for install >>>>> 
--     CXX Compiler: /usr/bin/c++
--     CXX Flags: 
--     BLAS and LAPACK Libraries: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
--     Link libraries: 
-- 
-- 
-- 
-- Configuring done (2.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pcchen/github/Cytnx/xxx

pcchen avatar Mar 16 '25 02:03 pcchen

Hmm. I just find out that:

If I do cmake -DUSE_CUDA=ON .. then it compiles OK.

If I do cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF .., I got following erros


[  1%] Building CXX object CMakeFiles/cytnx.dir/src/Bond.cpp.o
In file included from /usr/include/c++/13/bits/stl_tempbuf.h:61,
                 from /usr/include/c++/13/bits/stl_algo.h:69,
                 from /usr/include/c++/13/algorithm:61,
                 from /home/pcchen/github/Cytnx/src/RegularNetwork.cpp:1:
/usr/include/c++/13/bits/stl_construct.h: In instantiation of ‘void std::_Construct(_Tp*, _Args&& ...) [with _Tp = cytnx::Node; _Args = {shared_ptr<cytnx::Node>&}]’:
/usr/include/c++/13/bits/alloc_traits.h:661:19:   required from ‘static void std::allocator_traits<std::allocator<void> >::construct(allocator_type&, _Up*, _Args&& ...) [with _Up = cytnx::Node; _Args = {std::shared_ptr<cytnx::Node>&}; allocator_type = std::allocator<void>]’
/usr/include/c++/13/bits/shared_ptr_base.h:604:39:   required from ‘std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_Sp_counted_ptr_inplace(_Alloc, _Args&& ...) [with _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node; _Alloc = std::allocator<void>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr_base.h:971:16:   required from ‘std::__shared_count<_Lp>::__shared_count(_Tp*&, std::_Sp_alloc_shared_tag<_Alloc>, _Args&& ...) [with _Tp = cytnx::Node; _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr_base.h:1712:14:   required from ‘std::__shared_ptr<_Tp, _Lp>::__shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr.h:464:59:   required from ‘std::shared_ptr<_Tp>::shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node]’
/usr/include/c++/13/bits/shared_ptr.h:1009:14:   required from ‘std::shared_ptr<typename std::enable_if<(! std::is_array< <template-parameter-1-1> >::value), _Tp>::type> std::make_shared(_Args&& ...) [with _Tp = cytnx::Node; _Args = {shared_ptr<cytnx::Node>&}; typename enable_if<(! is_array< <template-parameter-1-1> >::value), _Tp>::type = cytnx::Node]’
/home/pcchen/github/Cytnx/src/RegularNetwork.cpp:1099:58:   required from here
/usr/include/c++/13/bits/stl_construct.h:119:7: error: no matching function for call to ‘cytnx::Node::Node(std::shared_ptr<cytnx::Node>&)’
  119 |       ::new((void*)__p) _Tp(std::forward<_Args>(__args)...);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/pcchen/github/Cytnx/include/Network.hpp:13,
                 from /home/pcchen/github/Cytnx/src/RegularNetwork.cpp:7:
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:54:5: note: candidate: ‘cytnx::Node::Node(std::shared_ptr<cytnx::Node>, std::shared_ptr<cytnx::Node>, const cytnx::UniTensor&)’
   54 |     Node(std::shared_ptr<Node> in_left, std::shared_ptr<Node> in_right,
      |     ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:54:5: note:   candidate expects 3 arguments, 1 provided
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:28:5: note: candidate: ‘cytnx::Node::Node(const cytnx::Node&)’
   28 |     Node(const Node& rhs)
      |     ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:28:22: note:   no known conversion for argument 1 from ‘std::shared_ptr<cytnx::Node>’ to ‘const cytnx::Node&’
   28 |     Node(const Node& rhs)
      |          ~~~~~~~~~~~~^~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:26:5: note: candidate: ‘cytnx::Node::Node()’
   26 |     Node() : is_assigned(false) {}
      |     ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:26:5: note:   candidate expects 0 arguments, 1 provided

pcchen avatar Mar 16 '25 02:03 pcchen

I have to explicitly turn off the cutensor/cuquantum. cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF ..

I checked CMakeLists.txt. Yes, the current default setting in Install.sh and CMakeLists.txt are not aligned. USE_CUTENSOR and USE_CUQUANTUM are defaulted to ON in CMakeLists.txt but are defaulted to OFF in Install.sh. As you found, the user has to explicitly set USE_CUTENSOR and USE_CUQUANTUM to OFF if they want to enable CUDA without enabling cuTENSOR and cuQuantum.

With #579, the behavior in CMakeLists.txt is aligned with the behavior in the current Install.sh. cuTENSOR and cuQuantum are not enabled with CUDA.

IvanaGyro avatar Mar 16 '25 05:03 IvanaGyro

Because I fail to compile gpu support when I turn off cutensor/cuquantum. I am wondering, with the current code.

  • Can one compiles gpu support using cuda, but WITHOUT cutensor/cuquantum?
  • Can one turns on only one of the cutensor/cuquantum? or they have to be turned on together.

pcchen avatar Mar 16 '25 07:03 pcchen

No, I can't either. I think it's a bug.

Changing this line

https://github.com/Cytnx-dev/Cytnx/blob/52561621f48b4b6d528e18b05847948808d4ca44/src/RegularNetwork.cpp#L1099

to

std::shared_ptr<Node> root = this->CtTree.nodes_container.back();

makes build sucessful.

IvanaGyro avatar Mar 16 '25 10:03 IvanaGyro

This is to confirm that, after changing line 1099 to

std::shared_ptr<Node> root = this->CtTree.nodes_container.back();

It compiles OK.

pcchen avatar Mar 16 '25 13:03 pcchen

Following this thread, I propose to using CuQuantum as the main dependency to avoid the complicated compiling options.

yingjerkao avatar Mar 16 '25 17:03 yingjerkao

Also, I wonder if CUTT is already integrated into cuTensorNet in cuQuantum?

yingjerkao avatar Mar 16 '25 19:03 yingjerkao

To remove CUTT dependency, I identified there are only two files that uses CUTT: cuTNPerm_gpu.cu and cuMovemem_gpu.cu. cuMovemem_gpu.cu already incorporated cuTensor API, while cuTNPerm_gpu.cu needs to replace the CUTT APIs with cutensorCreatePermutation() and cutensorPermute() as in https://docs.nvidia.com/cuda/cutensor/latest/api/cutensor.html#_CPPv425cutensorCreatePermutationK16cutensorHandle_tP29cutensorOperationDescriptor_tK26cutensorTensorDescriptor_tA_K7int32_t18cutensorOperator_tK26cutensorTensorDescriptor_tA_K7int32_tK27cutensorComputeDescriptor_t

yingjerkao avatar Aug 17 '25 08:08 yingjerkao