[Build] v2.4.0 with Cuda 11.0
Describe the issue
On some build (can't say for the moment why some works, other not):
src/tests/dense_lu.cu(114): error: identifier "cudaMallocAsync" is undefined
Environment information:
- OS: [e.g.
Redhat 8.8] - Compiler version: [e.g.
gcc 8.3.0] - CMake version: [e.g.
3.22.2] - CUDA used for AMGX compilation: [e.g.
CUDA 11.0] - MPI version (if applicable): [e.g.
OpenMPI 4.1.4] - AMGX version or commit hash [e.g.
v2.4.0] - Any related environment variables information
Configuration information
Provide your cmake command line that was used for configuration and it's full output:
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- The CUDA compiler identification is NVIDIA 11.0.221
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /ccc/products/gcc-8.3.0/system/default/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /ccc/products/gcc-8.3.0/system/default/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/exec/ccache/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found MPI_C: /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found CUDAToolkit: /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/include (found suitable version "11.0.221", minimum required is "10.0")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_CXX_FOUND) (found version "4.5")
This is a MPI build:TRUE
-- Found libcudacxx: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/libcudacxx/lib/cmake/libcudacxx/libcudacxx-config.cmake (found suitable version "1.8.1.0", minimum required is "1.8.0")
-- Found Thrust: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/thrust/cmake/thrust-config.cmake (found version "2.1.0.0")
-- Found CUB: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/cub/cub/cmake/cub-config.cmake (found suitable version "2.1.0.0", minimum required is "2.1.0.0")
-- Configuring done
-- Generating done
Compilation information Issue information
VERBOSE=1 make make[2]: Entering directory '/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/build' [ 0%] Building CUDA object CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/exec/ccache/nvcc -forward-unknown-to-host-compiler -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/external/rapidjson/include -I/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/include -I/ccc/products/openmpi-4.1.4/gcc--8.3.0/default/include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/../include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/thrust/cmake/../.. -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/libcudacxx/include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/cub/cub/cmake/../.. -L/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] -I/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/math_libs/include --compiler-options -L/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/math_libs/lib64 -fPIC -DNDEBUG --extended-lambda --Werror cross-execution-space-call -DNVTX_RANGES -DDISABLE_MIXED_PRECISION -DCUSPARSE_GENERIC_INTERFACES -DCUSPARSE_USE_GENERIC_SPGEMM -Xcompiler "-fno-openmp -Wno-terminate -DRAPIDJSON_DEFINED -DAMGX_WITH_MPI -rdynamic -fPIC -fvisibility=default" -DTHRUST_CUB_WRAPPED_NAMESPACE=amgx -std=c++14 -MD -MT CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o -MF CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o.d -x cu -c /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu -o CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu(114): error: identifier "cudaMallocAsync" is undefined
1 error detected in the compilation of "/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu".
Additional context
v2.3.0 build is OK. Another build of v2.4.0 with GCC 11.0.1, Cuda 11.8. is OK.
Replacing cudaMallocAsync by amgx::memory::cudaMallocAsync seems to fix. It is the correct fix ?
Then, errors during example builds:
VERBOSE=1 make cd /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/build/examples && /ccc/products2/cmake-3.22.2/Rhel_8__x86_64/system/default/bin/cmake -E cmake_link_script CMakeFiles/amgx_mpi_capi_agg.dir/link.txt --verbose=1 /ccc/products/gcc-8.3.0/system/default/bin/gcc -DRAPIDJSON_DEFINED -DAMGX_WITH_MPI -O3 -DNDEBUG -L/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8__x86_64/system/cuda-11.6/lib -L/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8__x86_64/system/cuda-11.6/lib CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o -o amgx_mpi_capi_agg /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so ../libamgxsh.so -lrt -ldl /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64/libcudart_static.a /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcublas.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcusolver.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcublas.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcusparse.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64/libnvToolsExt.so -lm -lpthread /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi_cxx.so /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so -ldl -lpthread /usr/lib64/librt.so CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync'
cudaMallocAsync & cudaFreeAsync appears with Cuda 11.2. So probably, that's the reason there. AmgX v2.4.0 don't build with Cuda<11.2
Thanks for reporting this. I changed our internal testing to include CUDA 11.0 (we previously had just 11.2, 11.8). I'll push a fix to main shortly.
Hopefully should be fixed in main.
Thanks Matt.