CUDA, CUTENSOR, CUQUANTUM
It seems that if I turn cuda option on via -DUSE_CUDA=ON, then one automatically turns on cutensor and cuquantum. If this is the case, then maybe we don't need -DUSE_CUTENSOR=ON and -DUSE_CUQUANTUM=ON?
Are you discussing the presets created in #579?
With the current settings (without #579), enabling CUDA will not automatically enable cuTENSOR and cuQuantum.
In #579, CMakeLists.txt follows the current settings. cuTENSOR and cuQuantum are not enabled when configuring with the command cmake -DUSE_CUDA. cuTENSOR and cuQuantum are only enabled with CUDA when using presets. cmake --preset openblas-cuda enables cuTENSOR and cuQuantum.
I updated the PR message of #579.
Question 1:
Can one build gpu support without using CUTENSOR and CUQUANTUM?
With commit 46dc390f2d5809ed070424dd24da8f5aa8b6cdf0 I assume this is the current settings.
If I do cmake -DUSE_CUDA=ON, it seems to automatically turns on the CUTENSOR/CUQUANTUM:
-- ------------------------------------------------------------------------
-- Project Cytnx, A Cross-section of Python & C++,Tensor network library
-- ------------------------------------------------------------------------
--
-- /home/pcchen/github/Cytnx/cmake/Modules
-- Generator: Unix Makefiles
-- Build Target: -
-- Installation Prefix:
-- Version: 1.0.0
-- The CXX compiler identification is GNU 13.3.0
-- The C compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CUDA compiler identification is NVIDIA 12.8.93
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- backend = cytnx
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- LAPACK found: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93")
-- Looking for cuTENSOR in /home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive
-- cudaver: 12
-- ok
-- Build with CuTensor: YES
-- CuTensor: libdir:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12 incdir:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/include libs:/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12/libcutensor.so;/home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12/libcutensorMg.so
-- Looking for cuTENSOR in /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive
-- cudaver: 12
-- ok
-- Build with CuQuantum: YES
-- CuQuantum: libdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib incdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/include libs:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcutensornet.so;/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcustatevec.so
-- Build CUDA Support: YES
-- - CUDA Version:
-- - CUDA Toolkit Root:
-- - Internal macro switch: GPU/CUDA
-- - Cudatoolkit include dir: /usr/local/cuda/targets/x86_64-linux/include
-- - Cudatoolkit lib dir: /usr/local/cuda/lib64
-- - CuSolver library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusolver.so
-- - Curand library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcurand.so
-- - CuBlas library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcublas.so
-- - Cuda rt library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a -ldl
-- - Cuda devrt library: -lrt -lcudadevrt
-- - Cuda cusparse library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusparse.so
-- Build OMP Support: NO
-- Found Python: /usr/bin/python3 (found version "3.12.3") found components: Interpreter Development Development.Module Development.Embed
-- Found pybind11: /usr/include (found version "2.11.1")
-- pybind11 include dir: /home/pcchen/src/libcutensor-linux-x86_64-2.2.0.0-archive/lib/12
-- pybind11 include dir: /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib
-- Build Python Wrapper: YES
-- - Python Excutable :
-- - Python Headers :
-- - Python Library :
-- Build Documentation: NO
-- |= Final FLAGS infomation for install >>>>>
-- CXX Compiler: /usr/bin/c++
-- CXX Flags:
-- BLAS and LAPACK Libraries: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Link libraries:
--
--
--
-- Configuring done (2.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pcchen/github/Cytnx/xxx
I have to explicitly turn off the cutensor/cuquantum. cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF ..
-- ------------------------------------------------------------------------
-- Project Cytnx, A Cross-section of Python & C++,Tensor network library
-- ------------------------------------------------------------------------
--
-- /home/pcchen/github/Cytnx/cmake/Modules
-- Generator: Unix Makefiles
-- Build Target: -
-- Installation Prefix:
-- Version: 1.0.0
-- The CXX compiler identification is GNU 13.3.0
-- The C compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CUDA compiler identification is NVIDIA 12.8.93
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- backend = cytnx
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- LAPACK found: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93")
-- Build CUDA Support: YES
-- - CUDA Version:
-- - CUDA Toolkit Root:
-- - Internal macro switch: GPU/CUDA
-- - Cudatoolkit include dir: /usr/local/cuda/targets/x86_64-linux/include
-- - Cudatoolkit lib dir: /usr/local/cuda/lib64
-- - CuSolver library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusolver.so
-- - Curand library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcurand.so
-- - CuBlas library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcublas.so
-- - Cuda rt library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a -ldl
-- - Cuda devrt library: -lrt -lcudadevrt
-- - Cuda cusparse library: /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcusparse.so
-- Build OMP Support: NO
-- Found Python: /usr/bin/python3 (found version "3.12.3") found components: Interpreter Development Development.Module Development.Embed
-- Found pybind11: /usr/include (found version "2.11.1")
-- pybind11 include dir:
-- pybind11 include dir:
-- Build Python Wrapper: YES
-- - Python Excutable :
-- - Python Headers :
-- - Python Library :
-- Build Documentation: NO
-- |= Final FLAGS infomation for install >>>>>
-- CXX Compiler: /usr/bin/c++
-- CXX Flags:
-- BLAS and LAPACK Libraries: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Link libraries:
--
--
--
-- Configuring done (2.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pcchen/github/Cytnx/xxx
Hmm. I just find out that:
If I do cmake -DUSE_CUDA=ON .. then it compiles OK.
If I do cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF .., I got following erros
[ 1%] Building CXX object CMakeFiles/cytnx.dir/src/Bond.cpp.o
In file included from /usr/include/c++/13/bits/stl_tempbuf.h:61,
from /usr/include/c++/13/bits/stl_algo.h:69,
from /usr/include/c++/13/algorithm:61,
from /home/pcchen/github/Cytnx/src/RegularNetwork.cpp:1:
/usr/include/c++/13/bits/stl_construct.h: In instantiation of ‘void std::_Construct(_Tp*, _Args&& ...) [with _Tp = cytnx::Node; _Args = {shared_ptr<cytnx::Node>&}]’:
/usr/include/c++/13/bits/alloc_traits.h:661:19: required from ‘static void std::allocator_traits<std::allocator<void> >::construct(allocator_type&, _Up*, _Args&& ...) [with _Up = cytnx::Node; _Args = {std::shared_ptr<cytnx::Node>&}; allocator_type = std::allocator<void>]’
/usr/include/c++/13/bits/shared_ptr_base.h:604:39: required from ‘std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_Sp_counted_ptr_inplace(_Alloc, _Args&& ...) [with _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node; _Alloc = std::allocator<void>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr_base.h:971:16: required from ‘std::__shared_count<_Lp>::__shared_count(_Tp*&, std::_Sp_alloc_shared_tag<_Alloc>, _Args&& ...) [with _Tp = cytnx::Node; _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr_base.h:1712:14: required from ‘std::__shared_ptr<_Tp, _Lp>::__shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]’
/usr/include/c++/13/bits/shared_ptr.h:464:59: required from ‘std::shared_ptr<_Tp>::shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {std::shared_ptr<cytnx::Node>&}; _Tp = cytnx::Node]’
/usr/include/c++/13/bits/shared_ptr.h:1009:14: required from ‘std::shared_ptr<typename std::enable_if<(! std::is_array< <template-parameter-1-1> >::value), _Tp>::type> std::make_shared(_Args&& ...) [with _Tp = cytnx::Node; _Args = {shared_ptr<cytnx::Node>&}; typename enable_if<(! is_array< <template-parameter-1-1> >::value), _Tp>::type = cytnx::Node]’
/home/pcchen/github/Cytnx/src/RegularNetwork.cpp:1099:58: required from here
/usr/include/c++/13/bits/stl_construct.h:119:7: error: no matching function for call to ‘cytnx::Node::Node(std::shared_ptr<cytnx::Node>&)’
119 | ::new((void*)__p) _Tp(std::forward<_Args>(__args)...);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/pcchen/github/Cytnx/include/Network.hpp:13,
from /home/pcchen/github/Cytnx/src/RegularNetwork.cpp:7:
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:54:5: note: candidate: ‘cytnx::Node::Node(std::shared_ptr<cytnx::Node>, std::shared_ptr<cytnx::Node>, const cytnx::UniTensor&)’
54 | Node(std::shared_ptr<Node> in_left, std::shared_ptr<Node> in_right,
| ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:54:5: note: candidate expects 3 arguments, 1 provided
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:28:5: note: candidate: ‘cytnx::Node::Node(const cytnx::Node&)’
28 | Node(const Node& rhs)
| ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:28:22: note: no known conversion for argument 1 from ‘std::shared_ptr<cytnx::Node>’ to ‘const cytnx::Node&’
28 | Node(const Node& rhs)
| ~~~~~~~~~~~~^~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:26:5: note: candidate: ‘cytnx::Node::Node()’
26 | Node() : is_assigned(false) {}
| ^~~~
/home/pcchen/github/Cytnx/include/contraction_tree.hpp:26:5: note: candidate expects 0 arguments, 1 provided
I have to explicitly turn off the cutensor/cuquantum. cmake -DUSE_CUDA=ON -DUSE_CUTENSOR=OFF -DUSE_CUQUANTUM=OFF ..
I checked CMakeLists.txt. Yes, the current default setting in Install.sh and CMakeLists.txt are not aligned. USE_CUTENSOR and USE_CUQUANTUM are defaulted to ON in CMakeLists.txt but are defaulted to OFF in Install.sh. As you found, the user has to explicitly set USE_CUTENSOR and USE_CUQUANTUM to OFF if they want to enable CUDA without enabling cuTENSOR and cuQuantum.
With #579, the behavior in CMakeLists.txt is aligned with the behavior in the current Install.sh. cuTENSOR and cuQuantum are not enabled with CUDA.
Because I fail to compile gpu support when I turn off cutensor/cuquantum. I am wondering, with the current code.
- Can one compiles gpu support using cuda, but WITHOUT cutensor/cuquantum?
- Can one turns on only one of the cutensor/cuquantum? or they have to be turned on together.
No, I can't either. I think it's a bug.
Changing this line
https://github.com/Cytnx-dev/Cytnx/blob/52561621f48b4b6d528e18b05847948808d4ca44/src/RegularNetwork.cpp#L1099
to
std::shared_ptr<Node> root = this->CtTree.nodes_container.back();
makes build sucessful.
This is to confirm that, after changing line 1099 to
std::shared_ptr<Node> root = this->CtTree.nodes_container.back();
It compiles OK.
Following this thread, I propose to using CuQuantum as the main dependency to avoid the complicated compiling options.
Also, I wonder if CUTT is already integrated into cuTensorNet in cuQuantum?
To remove CUTT dependency, I identified there are only two files that uses CUTT: cuTNPerm_gpu.cu and cuMovemem_gpu.cu. cuMovemem_gpu.cu already incorporated cuTensor API, while cuTNPerm_gpu.cu needs to replace the CUTT APIs with cutensorCreatePermutation() and cutensorPermute() as in https://docs.nvidia.com/cuda/cutensor/latest/api/cutensor.html#_CPPv425cutensorCreatePermutationK16cutensorHandle_tP29cutensorOperationDescriptor_tK26cutensorTensorDescriptor_tA_K7int32_t18cutensorOperator_tK26cutensorTensorDescriptor_tA_K7int32_tK27cutensorComputeDescriptor_t