omega_h icon indicating copy to clipboard operation
omega_h copied to clipboard

Build error with CUDA 11.2.1

Open lahwaacz opened this issue 4 years ago • 2 comments

The build fails with CUDA 11.2.1 (on Arch Linux). nvcc_wrapper is available in /usr/bin/ as part of the trilinos package (I'm not building with Kokkos which is included in Trilinos too).

CMake output:

$ mkdir build
$ cd build
$ cmake .. -DOmega_h_USE_CUDA=on -DCMAKE_CXX_COMPILER=nvcc_wrapper
-- The CXX compiler identification is GNU 10.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/nvcc_wrapper - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_VERSION: 3.19.7
-- Omega_h_VERSION: 9.33.2
-- USE_XSDK_DEFAULTS: OFF
-- BUILD_TESTING: OFF
-- BUILD_SHARED_LIBS: ON
-- CMAKE_INSTALL_PREFIX: /usr/local
-- Omega_h_CHECK_BOUNDS: OFF
-- Omega_h_THROW: OFF
-- Omega_h_DATA: 
-- Omega_h_USE_EGADS: OFF
-- EGADS_PREFIX: 
-- Omega_h_USE_Kokkos: OFF
-- Kokkos_PREFIX: 
-- Omega_h_USE_CUDA_AWARE_MPI: OFF
-- Omega_h_VALGRIND: 
-- Omega_h_EXAMPLES: OFF
-- Omega_h_USE_MPI: OFF
-- Omega_h_USE_ZLIB: ON
-- ZLIB_PREFIX: 
-- Found ZLIB: /usr/lib/libz.so (found version "1.2.11") 
-- Omega_h_USE_Kokkos: OFF
-- Omega_h_USE_libMeshb: OFF
-- Omega_h_USE_Gmsh: OFF
-- Omega_h_USE_Gmodel: OFF
-- Omega_h_USE_SEACASExodus: OFF
-- Omega_h_USE_pybind11: OFF
-- Omega_h_USE_OpenMP: OFF
-- Omega_h_USE_CUDA: on
-- The CUDA compiler identification is NVIDIA 11.2.142
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Omega_h_USE_DOLFIN: OFF
-- Omega_h_SEMVER = 9.33.2-sha.91330909+000110000000001
-- Configuring done
-- Generating done
-- Build files have been written to: /home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/build

Build output:

$ make
...
$ make -j1
[  1%] Building CUDA object src/CMakeFiles/omega_h.dir/Omega_h_int_scan.cpp.o
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(578): error: array of reference is not allowed
          detected during:
            instantiation of class "thrust::cuda_cub::__scan::DoNothing<T> [with T=const Omega_h::LO &]" 
(784): here
            instantiation of "OutputIt thrust::cuda_cub::inclusive_scan_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, OutputIt, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Size=std::ptrdiff_t, OutputIt=Omega_h::LO *, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/transform_scan.h(72): here
            instantiation of "OutputIt thrust::cuda_cub::transform_inclusive_scan(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, OutputIt, TransformOp, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=Omega_h::LO *, OutputIt=Omega_h::LO *, TransformOp=thrust::identity<Omega_h::LO>, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/detail/transform_scan.inl(47): here
            instantiation of "OutputIterator thrust::transform_inclusive_scan(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, OutputIterator, UnaryFunction, AssociativeOperator) [with DerivedPolicy=thrust::cuda_cub::par_t, InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, UnaryFunction=thrust::identity<Omega_h::LO>, AssociativeOperator=thrust::maximum<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_scan.hpp(84): here
            instantiation of "OutputIterator Omega_h::transform_inclusive_scan(InputIterator, InputIterator, OutputIterator, BinaryOp, UnaryOp) [with InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, BinaryOp=Omega_h::maximum<Omega_h::LO>, UnaryOp=Omega_h::identity<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_int_scan.cpp(32): here

/opt/cuda/bin/../targets/x86_64-linux/include/cub/block/block_load.cuh(974): error: array of reference is not allowed
          detected during:
            instantiation of class "cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal<cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, DUMMY> [with InputT=const Omega_h::LO &, BLOCK_DIM_X=128, ITEMS_PER_THREAD=12, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=520, DUMMY=0]" 
(1015): here
            instantiation of class "cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH> [with InputT=const Omega_h::LO &, BLOCK_DIM_X=128, ITEMS_PER_THREAD=12, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=520]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(264): here
            instantiation of union "thrust::cuda_cub::__scan::ScanAgent<InputIt, OutputIt, ScanOp, Size, T, Inclusive>::PtxPlan<Arch>::TempStorage [with InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, OutputIt=Omega_h::LO *, ScanOp=thrust::maximum<Omega_h::LO>, Size=thrust::detail::int32_t, T=const Omega_h::LO &, Inclusive=thrust::detail::true_type, Arch=thrust::cuda_cub::core::sm52]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(202): here
            instantiation of class "thrust::cuda_cub::core::temp_storage_size_impl<Agent, thrust::detail::true_type> [with Agent=thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__scan::ScanAgent<thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Omega_h::LO *, thrust::maximum<Omega_h::LO>, thrust::detail::int32_t, const Omega_h::LO &, thrust::detail::true_type>::PtxPlan, thrust::cuda_cub::core::sm60>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(207): here
            instantiation of class "thrust::cuda_cub::core::temp_storage_size<Agent> [with Agent=thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__scan::ScanAgent<thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Omega_h::LO *, thrust::maximum<Omega_h::LO>, thrust::detail::int32_t, const Omega_h::LO &, thrust::detail::true_type>::PtxPlan, thrust::cuda_cub::core::sm60>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(225): here
            [ 4 instantiation contexts not shown ]
            instantiation of "OutputIt thrust::cuda_cub::__scan::scan<Inclusive,Derived,InputIt,OutputIt,Size,ScanOp,AddInitToExclusiveScan>(thrust::cuda_cub::execution_policy<Derived> &, InputIt, OutputIt, Size, ScanOp, AddInitToExclusiveScan) [with Inclusive=thrust::detail::true_type, Derived=thrust::cuda_cub::par_t, InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, OutputIt=Omega_h::LO *, Size=std::ptrdiff_t, ScanOp=thrust::maximum<Omega_h::LO>, AddInitToExclusiveScan=thrust::cuda_cub::__scan::DoNothing<const Omega_h::LO &>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(784): here
            instantiation of "OutputIt thrust::cuda_cub::inclusive_scan_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, OutputIt, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Size=std::ptrdiff_t, OutputIt=Omega_h::LO *, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/transform_scan.h(72): here
            instantiation of "OutputIt thrust::cuda_cub::transform_inclusive_scan(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, OutputIt, TransformOp, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=Omega_h::LO *, OutputIt=Omega_h::LO *, TransformOp=thrust::identity<Omega_h::LO>, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/detail/transform_scan.inl(47): here
            instantiation of "OutputIterator thrust::transform_inclusive_scan(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, OutputIterator, UnaryFunction, AssociativeOperator) [with DerivedPolicy=thrust::cuda_cub::par_t, InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, UnaryFunction=thrust::identity<Omega_h::LO>, AssociativeOperator=thrust::maximum<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_scan.hpp(84): here
            instantiation of "OutputIterator Omega_h::transform_inclusive_scan(InputIterator, InputIterator, OutputIterator, BinaryOp, UnaryOp) [with InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, BinaryOp=Omega_h::maximum<Omega_h::LO>, UnaryOp=Omega_h::identity<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_int_scan.cpp(32): here

/opt/cuda/bin/../targets/x86_64-linux/include/cub/block/block_load.cuh(984): error: array of reference is not allowed
          detected during:
            instantiation of class "cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal<cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, DUMMY> [with InputT=const Omega_h::LO &, BLOCK_DIM_X=128, ITEMS_PER_THREAD=12, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=520, DUMMY=0]" 
(1015): here
            instantiation of class "cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH> [with InputT=const Omega_h::LO &, BLOCK_DIM_X=128, ITEMS_PER_THREAD=12, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=520]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(264): here
            instantiation of union "thrust::cuda_cub::__scan::ScanAgent<InputIt, OutputIt, ScanOp, Size, T, Inclusive>::PtxPlan<Arch>::TempStorage [with InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, OutputIt=Omega_h::LO *, ScanOp=thrust::maximum<Omega_h::LO>, Size=thrust::detail::int32_t, T=const Omega_h::LO &, Inclusive=thrust::detail::true_type, Arch=thrust::cuda_cub::core::sm52]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(202): here
            instantiation of class "thrust::cuda_cub::core::temp_storage_size_impl<Agent, thrust::detail::true_type> [with Agent=thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__scan::ScanAgent<thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Omega_h::LO *, thrust::maximum<Omega_h::LO>, thrust::detail::int32_t, const Omega_h::LO &, thrust::detail::true_type>::PtxPlan, thrust::cuda_cub::core::sm60>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(207): here
            instantiation of class "thrust::cuda_cub::core::temp_storage_size<Agent> [with Agent=thrust::cuda_cub::core::specialize_plan<thrust::cuda_cub::__scan::ScanAgent<thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Omega_h::LO *, thrust::maximum<Omega_h::LO>, thrust::detail::int32_t, const Omega_h::LO &, thrust::detail::true_type>::PtxPlan, thrust::cuda_cub::core::sm60>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/core/util.h(225): here
            [ 4 instantiation contexts not shown ]
            instantiation of "OutputIt thrust::cuda_cub::__scan::scan<Inclusive,Derived,InputIt,OutputIt,Size,ScanOp,AddInitToExclusiveScan>(thrust::cuda_cub::execution_policy<Derived> &, InputIt, OutputIt, Size, ScanOp, AddInitToExclusiveScan) [with Inclusive=thrust::detail::true_type, Derived=thrust::cuda_cub::par_t, InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, OutputIt=Omega_h::LO *, Size=std::ptrdiff_t, ScanOp=thrust::maximum<Omega_h::LO>, AddInitToExclusiveScan=thrust::cuda_cub::__scan::DoNothing<const Omega_h::LO &>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/scan.h(784): here
            instantiation of "OutputIt thrust::cuda_cub::inclusive_scan_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, OutputIt, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=thrust::cuda_cub::transform_input_iterator_t<const Omega_h::LO &, Omega_h::LO *, thrust::identity<Omega_h::LO>>, Size=std::ptrdiff_t, OutputIt=Omega_h::LO *, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/transform_scan.h(72): here
            instantiation of "OutputIt thrust::cuda_cub::transform_inclusive_scan(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, OutputIt, TransformOp, ScanOp) [with Derived=thrust::cuda_cub::par_t, InputIt=Omega_h::LO *, OutputIt=Omega_h::LO *, TransformOp=thrust::identity<Omega_h::LO>, ScanOp=thrust::maximum<Omega_h::LO>]" 
/opt/cuda/bin/../targets/x86_64-linux/include/thrust/detail/transform_scan.inl(47): here
            instantiation of "OutputIterator thrust::transform_inclusive_scan(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, OutputIterator, UnaryFunction, AssociativeOperator) [with DerivedPolicy=thrust::cuda_cub::par_t, InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, UnaryFunction=thrust::identity<Omega_h::LO>, AssociativeOperator=thrust::maximum<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_scan.hpp(84): here
            instantiation of "OutputIterator Omega_h::transform_inclusive_scan(InputIterator, InputIterator, OutputIterator, BinaryOp, UnaryOp) [with InputIterator=Omega_h::LO *, OutputIterator=Omega_h::LO *, BinaryOp=Omega_h::maximum<Omega_h::LO>, UnaryOp=Omega_h::identity<Omega_h::LO>]" 
/home/lahwaacz/Bbox/pg/cpp/3rd party/omega_h/src/Omega_h_int_scan.cpp(32): here

The full output is much longer, so I've copy-pasted just the first 3 actual errors.

lahwaacz avatar Mar 19 '21 18:03 lahwaacz

I too created an issue for this: https://github.com/SCOREC/omega_h/issues/14

@ibaned Reported this to the Thrust team and it is supposed to be fixed in 11.3.

cwsmith avatar Jun 04 '21 20:06 cwsmith

Fundamentally, CUDA 11.2 is unusable for us. Please use either earlier or newer versions.

ibaned avatar Jun 04 '21 20:06 ibaned