Cytnx icon indicating copy to clipboard operation
Cytnx copied to clipboard

Fail to compile GPU version (Ubuntu 24, notebook computer)

Open pcchen opened this issue 9 months ago • 12 comments

With this

-- ------------------------------------------------------------------------
--   Project Cytnx, A Cross-section of Python & C++,Tensor network library 
-- ------------------------------------------------------------------------
-- 
-- /home/pcchen/github/Cytnx/cmake/Modules
--  Generator: Unix Makefiles
--  Build Target: -
--  Installation Prefix: /usr/local
--  Version: 1.0.0
-- The CUDA compiler identification is NVIDIA 12.0.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- backend = cytnx
-- LAPACK found: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
-- Found CUDAToolkit: /usr/include (found version "12.0.140") 
-- Looking for cuTENSOR in /home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive
--  cudaver: 12
-- ok
-- Build with CuTensor: YES
-- CuTensor: libdir:/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/lib/12 incdir:/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/include libs:/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/lib/12/libcutensor.so;/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/lib/12/libcutensorMg.so
-- Looking for cuTENSOR in /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive
--  cudaver: 12
-- ok
-- Build with CuQuantum: YES
-- CuQuantum: libdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib incdir:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/include libs:/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcutensornet.so;/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib/libcustatevec.so
--  Build CUDA Support: YES
--   - CUDA Version: 
--   - CUDA Toolkit Root: 
--   - Internal macro switch: GPU/CUDA
--   - Cudatoolkit include dir: /usr/include
--   - Cudatoolkit lib dir: /usr/lib/x86_64-linux-gnu
--   - CuSolver library: /usr/lib/x86_64-linux-gnu/libcusolver.so
--   - Curand library: /usr/lib/x86_64-linux-gnu/libcurand.so
--   - CuBlas library: /usr/lib/x86_64-linux-gnu/libcublas.so
--   - Cuda rt library: /usr/lib/x86_64-linux-gnu/libcudart_static.a -ldl
--   - Cuda devrt library:  -lrt -lcudadevrt
--   - Cuda cusparse library: /usr/lib/x86_64-linux-gnu/libcusparse.so
--  Build OMP Support: NO
-- Found pybind11: /usr/include (found version "2.11.1")
-- pybind11 include dir: /home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/lib/12
-- pybind11 include dir: /home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/lib
--  Build Python Wrapper: YES
--   - Python Excutable  : 
--   - Python Headers    : 
--   - Python Library    : 
--  Build Documentation: NO
-- |= Final FLAGS infomation for install >>>>> 
--     CXX Compiler: /usr/bin/c++
--     CXX Flags: 
--     BLAS and LAPACK Libraries: /opt/intel/oneapi/mkl/2025.0/lib/libmkl_rt.so;-lm;-ldl;-lm;-ldl
--     Link libraries: 
-- 
-- 
-- 
-- Configuring done (2.7s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pcchen/github/Cytnx/build_mkl_gpu

I got

[ 89%] Building CUDA object CMakeFiles/cytnx.dir/src/backend/utils_internal_gpu/cuGetElems_gpu.cu.o
/home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuFill_gpu.cu(35): error: no suitable user-defined conversion from "const cytnx::cytnx_complex128" to "CudaDType" exists
          detected during instantiation of "void cytnx::utils_internal::FillGpu(void *, const DType &, cytnx::cytnx_uint64) [with DType=cytnx::cytnx_complex128]" 
(38): here

/home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuFill_gpu.cu(35): error: no instance of function template "cytnx::utils_internal::FillGpuKernel" matches the argument list
            argument types are: (CudaDType *, <error-type>, cytnx::cytnx_uint64)
          detected during instantiation of "void cytnx::utils_internal::FillGpu(void *, const DType &, cytnx::cytnx_uint64) [with DType=cytnx::cytnx_complex128]" 
(38): here

/home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuFill_gpu.cu(35): error: no suitable user-defined conversion from "const cytnx::cytnx_complex64" to "CudaDType" exists
          detected during instantiation of "void cytnx::utils_internal::FillGpu(void *, const DType &, cytnx::cytnx_uint64) [with DType=cytnx::cytnx_complex64]" 
(39): here

/home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuFill_gpu.cu(35): error: no instance of function template "cytnx::utils_internal::FillGpuKernel" matches the argument list
            argument types are: (CudaDType *, <error-type>, cytnx::cytnx_uint64)
          detected during instantiation of "void cytnx::utils_internal::FillGpu(void *, const DType &, cytnx::cytnx_uint64) [with DType=cytnx::cytnx_complex64]" 
(39): here

4 errors detected in the compilation of "/home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuFill_gpu.cu".
make[2]: *** [CMakeFiles/cytnx.dir/build.make:3310:CMakeFiles/cytnx.dir/src/backend/utils_internal_gpu/cuFill_gpu.cu.o] 錯誤 2
make[2]: *** 正在等待未完成的作業....
make[1]: *** [CMakeFiles/Makefile2:366:CMakeFiles/cytnx.dir/all] 錯誤 2
make: *** [Makefile:136:all] 錯誤 2

Here is the GPU info nvidia-smi Fri Mar 14 14:57:52 2025
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce MX450 Off | 00000000:01:00.0 Off | N/A | | N/A 51C P8 N/A / 16W | 7MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3202 G /usr/lib/xorg/Xorg 4MiB | +---------------------------------------------------------------------------------------+

pcchen avatar Mar 14 '25 06:03 pcchen

Your compiler may not find the correct "cuda/std/complex" include file which has been included since CUDA 11.4. Could you follow the include path in compile_commands.json and check the version or content of "cuda/std/complex" found by the compiler?

Another possible solution is upgrading your CUDA version to 12.6. Someone seems failed to build with 12.2 but succeed in building with 12.6.

IvanaGyro avatar Mar 14 '25 07:03 IvanaGyro

This?

/usr/bin/nvcc -forward-unknown-to-host-compiler \
 -DMKL_ILP64 -DUNI_CUQUANTUM -DUNI_CUTENSOR -DUNI_GPU -DUNI_MKL \
-D_LIBCPP_DISABLE_AVAILABILITY -D_LIBCPP_ENABLE_CXX17_REMOVED_UNARY_BINARY_FUNCTION \
-I/home/pcchen/github/Cytnx/src \
-I/home/pcchen/github/Cytnx/include \
-I/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/include \
-I/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/include  \
-Xcudafe=--display_error_number -lineinfo -m64 -O3 -DNDEBUG -std=c++17 \
-arch=native -Xcompiler=-fPIC -Wformat=0 -w -fsized-deallocation -x \
cu -rdc=true -c /home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuGetElems_gpu.cu \
-o CMakeFiles/cytnx.dir/src/backend/utils_internal_gpu/cuGetElems_gpu.cu.o"

pcchen avatar Mar 14 '25 08:03 pcchen

I use apt install for

libcudart12/noble,now 12.0.146~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-dev/noble,now 12.0.146~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-gdb/noble,now 12.0.140~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-toolkit-doc/noble,noble,now 12.0.1-4build4 all [已安裝,自動]
nvidia-cuda-toolkit/noble,now 12.0.140~12.0.1-4build4 amd64 [已安裝]

pcchen avatar Mar 14 '25 08:03 pcchen

This?

/usr/bin/nvcc -forward-unknown-to-host-compiler \
 -DMKL_ILP64 -DUNI_CUQUANTUM -DUNI_CUTENSOR -DUNI_GPU -DUNI_MKL \
-D_LIBCPP_DISABLE_AVAILABILITY -D_LIBCPP_ENABLE_CXX17_REMOVED_UNARY_BINARY_FUNCTION \
-I/home/pcchen/github/Cytnx/src \
-I/home/pcchen/github/Cytnx/include \
-I/home/pcchen/src/libcutensor-linux-x86_64-2.1.0.9-archive/include \
-I/home/pcchen/src/cuquantum-linux-x86_64-24.11.0.21_cuda12-archive/include  \
-Xcudafe=--display_error_number -lineinfo -m64 -O3 -DNDEBUG -std=c++17 \
-arch=native -Xcompiler=-fPIC -Wformat=0 -w -fsized-deallocation -x \
cu -rdc=true -c /home/pcchen/github/Cytnx/src/backend/utils_internal_gpu/cuGetElems_gpu.cu \
-o CMakeFiles/cytnx.dir/src/backend/utils_internal_gpu/cuGetElems_gpu.cu.o"

This is interesting. CUDA libcu++'s header path is not in the include paths. Could you try to find the header "cuda/std/complex" that your nvcc found?

IvanaGyro avatar Mar 14 '25 08:03 IvanaGyro

I guess cmake recognizes your /usr/include as the CUDA toolkit include folder. Is "cuda/std/complex" in your /usr/include?

IvanaGyro avatar Mar 14 '25 08:03 IvanaGyro

I have /usr/include/cuda/std/complex and /usr/local/cuda/include/cuda/std/complex .

I don't know which package provides which orz.

pcchen avatar Mar 14 '25 08:03 pcchen

I installed the cuda tookit 12-8 following the official website https://developer.nvidia.com/cuda-toolkit

Now I have so many cuda toolkit packages installed. But it still fail at the same place.

cuda-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-cccl-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-command-line-tools-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-compiler-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-crt-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-cudart-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-cudart-dev-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-cuobjdump-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-cupti-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-cupti-dev-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-cuxxfilt-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-demo-suite-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-documentation-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-driver-dev-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-gdb-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-keyring/不明,不明,now 1.1-1 all [已安裝]
cuda-libraries-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-libraries-dev-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-nsight-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nsight-compute-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-nsight-systems-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-nvcc-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-nvdisasm-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nvml-dev-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nvprof-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nvprune-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nvrtc-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-nvrtc-dev-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-nvtx-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-nvvm-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-nvvp-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-opencl-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-opencl-dev-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-profiler-api-12-8/不明,now 12.8.90-1 amd64 [已安裝,自動]
cuda-runtime-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-sanitizer-12-8/不明,now 12.8.93-1 amd64 [已安裝,自動]
cuda-toolkit-12-8-config-common/不明,now 12.8.90-1 all [已安裝,自動]
cuda-toolkit-12-8/不明,now 12.8.1-1 amd64 [已安裝]
cuda-toolkit-12-config-common/不明,now 12.8.90-1 all [已安裝,自動]
cuda-toolkit-config-common/不明,now 12.8.90-1 all [已安裝,自動]
cuda-tools-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda-visual-tools-12-8/不明,now 12.8.1-1 amd64 [已安裝,自動]
cuda/不明,now 12.8.1-1 amd64 [已安裝]
libcudart12/noble,now 12.0.146~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-dev/noble,now 12.0.146~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-gdb/noble,now 12.0.140~12.0.1-4build4 amd64 [已安裝,自動]
nvidia-cuda-toolkit-doc/noble,noble,now 12.0.1-4build4 all [已安裝,自動]
nvidia-cuda-toolkit/noble,now 12.0.140~12.0.1-4build4 amd64 [已安裝]

pcchen avatar Mar 14 '25 08:03 pcchen

Now My NVIDIA-SMI has failed because it couldn't communicte with NVIDIA driver.

pcchen avatar Mar 14 '25 09:03 pcchen

Now My NVIDIA-SMI has failed because it couldn't communicte with NVIDIA driver.

That is an incompatibility with the kernel driver and the installed CUDA version. You need to restart the computer; if it keeps happening then you have a mismatch with the kernel package and the toolkit package.

ianmccul avatar Mar 14 '25 09:03 ianmccul

After a few re-installtion of the ubuntu 24. I finally is able to compile it.

If I use the kernel-driver (550) and cuda toolkit (12.0) provided by the ubuntu 24, then the following error will appear.

error: no suitable user-defined conversion from "const cytnx::cytnx_complex128" to "CudaDType" exists
          detected during instantiation of "void cytnx::utils_internal::FillGpu(void *, const DType &, cytnx::cytnx_uint64) [with DType=cytnx::cytnx_complex128]" 

Note that nvidia-detector will return nvidai-driver-570.

If I try apt install cuda-toolkit 12.8 from CUDA it will install the kernel-driver 570 and toolkit 12.8. BUT the nvidia-smi will fail because it couldn't communicte with NVIDIA driver.

The following path works

  • Use ubuntu 24's nvidia-driver (550).
  • Use run file to install nvidia-toolkit 12.8. But turn off the option to install kernel driver 570.
  • Perofrm mandatory post-installation actions.

I think, somehow kernel-driver 570 does not work, despite nvidia-detector returns nvidai-driver-570.

pcchen avatar Mar 15 '25 04:03 pcchen

Could you post the cmake configuration result of the successful try?

IvanaGyro avatar Mar 15 '25 07:03 IvanaGyro

@pcchen Can you check again with commit bc222ec17b08fb6ea4f35ceb2bbb682076f7a7da ?

yingjerkao avatar Aug 19 '25 02:08 yingjerkao