NNPOps icon indicating copy to clipboard operation
NNPOps copied to clipboard

Set c++17 standard in CMake for recent torch/cuda versions

Open RaulPPelaez opened this issue 1 year ago • 10 comments

Compiling with CUDA 12 and a very recent pytorch version (such as v2.1.0 from the nightly) will make compilation fail because C++17 is required to compile pytorch:

(test7) $ Torch_DIR=$(python -c 'import torch;print(torch.utils.cmake_prefix_path)')  cmake -DCMAKE_BUILD_TYPE=Release ..                                 
make -j15                                                                                                                                                                    
-- The CXX compiler identification is GNU 12.3.0                                                                                                                             
-- Detecting CXX compiler ABI info                                                                                                                                           
-- Detecting CXX compiler ABI info - done                                                                                                                                    
-- Check for working CXX compiler: /shared/raul/mambaforge/envs/test7/bin/x86_64-conda-linux-gnu-c++ - skipped                                                               
-- Detecting CXX compile features                                                                                                                                            
-- Detecting CXX compile features - done                                                                                                                                     
-- The CUDA compiler identification is NVIDIA 12.1.105                                                                                                                       
-- Detecting CUDA compiler ABI info                                                                                                                                          
-- Detecting CUDA compiler ABI info - done                                                                                                                                   
-- Check for working CUDA compiler: /shared/raul/mambaforge/envs/test7/bin/nvcc - skipped                                                                                    
-- Detecting CUDA compile features                                                                                                                                           
-- Detecting CUDA compile features - done                                                                                                                                    
-- Found Python3: /shared/raul/mambaforge/envs/test7/bin/python3.11 (found version "3.11.0") found components: Interpreter Development Development.Module Development.Embed  
-- Found CUDA: /shared/raul/mambaforge/envs/test7 (found version "12.1")                                                                                                     
-- Found CUDAToolkit: /shared/raul/mambaforge/envs/test7/include (found version "12.1.105") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD                                            
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed                                   
-- Looking for pthread_create in pthreads                                             
-- Looking for pthread_create in pthreads - not found                                 
-- Looking for pthread_create in pthread                                              
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /shared/raul/mambaforge/envs/test7/bin/nvcc
-- Caffe2: CUDA toolkit directory: /shared/raul/mambaforge/envs/test7
-- Caffe2: Header version is: 12.1
-- /shared/raul/mambaforge/envs/test7/lib/libnvrtc.so shorthash is 8144a3bc      
-- USE_CUDNN is set to 0. Compiling without cuDNN support                          
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support                                                                                                          -- Autodetected CUDA architecture(s):  8.9 8.9 8.9 8.9                                                                                                                       
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89                                                                                                            
-- MKL_ARCH: intel64                                                                                                                                                         
-- MKL_ROOT /shared/raul/mambaforge/envs/test7                                                                                                                               
-- MKL_LINK: dynamic                                                                                                                                                         
-- MKL_INTERFACE_FULL: intel_ilp64                                                                                                                                           
-- MKL_THREADING: intel_thread                                                                                                                                               
-- MKL_MPI: intelmpi                                                                                                                                                         
CMake Warning at /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):                 
  static library kineto_LIBRARY-NOTFOUND not found.                                                                                                                          
Call Stack (most recent call first):                                                                                                                                         
  /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)                                     CMakeLists.txt:13 (find_package)                                                                                                                                           
                                                                                                                                                                             
                                                                                                                                                                             
-- Configuring done (1.6s)                                                                                                                                                   
-- Generating done (0.1s)                                                                                                                                                    
-- Build files have been written to: /shared/raul/NNPOps/build                                                                                                               
(test7) $ make -j15                                                                                                                             [140/1551]
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/BatchedNN.cpp.o                                                                                          
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CpuANISymmetryFunctions.cpp.o                            
[ 26%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConv.cpp.o                                                                                             
[ 26%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CudaANISymmetryFunctions.cu.o                                            
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/SymmetryFunctions.cpp.o                                          
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConvNeighbors.cpp.o                                                       
[ 43%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCPU.cpp.o                                                                      [ 43%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCUDA.cu.o                                                                     
[ 52%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/neighbors.cpp.o                                                                                
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCPU.cpp.o                                                                                         
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pme.cpp.o                                                                                            
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CudaCFConv.cu.o                                                                                          
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCUDA.cu.o                                                                                        
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CpuCFConv.cpp.o                                                                                           
In file included from /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/extension.h:4,                                                     
                 from /shared/raul/NNPOps/src/pytorch/pme/pmeCUDA.cu:1:                                                                                                      
/shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is req
uired to use PyTorch.                                                                                                                                                        
    4 | #error C++17 or later compatible compiler is required to use PyTorch.                                                                                                      |  ^~~~~                                                                                                                                                               
[ 60%] Built target copy_test   

Simply setting the standard from 14 to 17 in CMakeLists.txt fixes it. CUDA 11 also supports C++17, but CUDA 10.2 does not. I check for this and leave it at C++14 in that case. GCC supports C++17 since version 7, so I default it to it.

RaulPPelaez avatar Aug 08 '23 11:08 RaulPPelaez

This is ready to merge.

RaulPPelaez avatar Aug 17 '23 13:08 RaulPPelaez

CUDA 11.8 build tends to fail due to some form of disk access error when installing CUDA. Must be a bug in the Jimver thingy. There is a new version, lets try with that...

RaulPPelaez avatar Aug 17 '23 14:08 RaulPPelaez

I have purged the GA cache. If it fails, try to rerun.

raimis avatar Aug 17 '23 14:08 raimis

I am not sure if I do not have rights to do so or just do not know how, but I cannot rerun the CI. I will just make a spurious commit.

RaulPPelaez avatar Aug 17 '23 14:08 RaulPPelaez

11.8 Still refuses to download it seems.

RaulPPelaez avatar Aug 17 '23 15:08 RaulPPelaez

[Linux (CUDA 11.8, Python 3.10, PyTorch 2.0)](https://github.com/openmm/NNPOps/actions/runs/5892449251/job/15981745203#step:1:39)
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 0 MB

raimis avatar Aug 17 '23 15:08 raimis

Do you know if this disk limit is per action or per individual check? If it is the former maybe we can do something, for the latter I do not really know why cuda 11.2 takes more space than 11.8 as to go over the threshold.

RaulPPelaez avatar Aug 17 '23 15:08 RaulPPelaez

This is ready for review. With the changes in conda-forge regarding CUDA, from version 12 there is no need to install cuda at the OS level in the CI (so no Jimver/cuda github action). This is good news here because the current CI is constantly running out of space. However, the workflow is different enough that I decided to move it to a different CI. The idea being that eventually the old one will be dropped (when CUDA 12 is the oldest version supported I guess).

I had to deal with a couple of quicks in the compilation process for pytorch 2.1 and CUDA 12. In particular:

  • torch is autodetecting wrongly the cuda archs, sending sm_35 to nvcc 12, which is deprecated. To fix it I just set TORCH_CUDA_ARCHS=8.9 to give it an example
  • conda-forge installs cuda headers to a non standard directory $CONDA_PREFIX/$targetsDir/include. For some reason this is preventing torch from finding the cuda headers. I had to set CUDA_INC_PATH manually to that directory.

RaulPPelaez avatar Nov 17 '23 11:11 RaulPPelaez

I am using the changes to CMakeLists.txt as a patch to build this https://github.com/conda-forge/nnpops-feedstock/pull/29

RaulPPelaez avatar Nov 17 '23 11:11 RaulPPelaez

@mikemhenry I would like to merge this, but I believe the self hosted runner is not working.

RaulPPelaez avatar Mar 04 '24 11:03 RaulPPelaez