thrust icon indicating copy to clipboard operation
thrust copied to clipboard

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds.

Open jeffhammond opened this issue 2 years ago • 9 comments

I can't tell where the problem is. Why does the version of Thrust that I get with stdpar=gpu work, whereas the version from GitHub doesn't?

Thanks

MCVE

#include <thrust/universal_vector.h>

thrust::universal_vector<float> m_x ;

void AllocateNodePersistent(int numNode)
{
  m_x.resize(numNode);
}

It works with stdpar=gpu

$ nvc++ -std=c++17 -stdpar=gpu -c bug.cc && echo OKAY
OKAY

It fails with 25547a4308fc87e2f5e6cb0f71f6f4e45aabe726

$ nvc++ -std=c++17 -I/home/jhammond/NVIDIA/thrust -c bug.cc
"/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/for_each.h", line 65: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each_n(thrust::execution_policy<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 67 of "/home/jhammond/NVIDIA/thrust/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 93 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/generate.inl"
            instantiation of "OutputIterator thrust::system::detail::generic::generate_n(thrust::execution_policy<ExecutionPolicy> &, OutputIterator, Size, Generator) [with ExecutionPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 56 of "/home/jhammond/NVIDIA/thrust/thrust/detail/generate.inl"
            instantiation of "OutputIterator thrust::generate_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, OutputIterator, Size, Generator) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 42 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/fill.h"
            instantiation of "OutputIterator thrust::system::detail::generic::fill_n(thrust::execution_policy<DerivedPolicy> &, OutputIterator, Size, const T &) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, T=float]" at line 51 of "/home/jhammond/NVIDIA/thrust/thrust/detail/fill.inl"
            [ 4 instantiation contexts not shown ]
            instantiation of "thrust::detail::disable_if<thrust::detail::allocator_traits_detail::needs_default_construct_via_allocator<Allocator, thrust::detail::pointer_element<Pointer>::type>::value, void>::type thrust::detail::allocator_traits_detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 106 of "/home/jhammond/NVIDIA/thrust/thrust/detail/allocator/default_construct_range.inl"
            instantiation of "void thrust::detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 254 of "/home/jhammond/NVIDIA/thrust/thrust/detail/contiguous_storage.inl"
            instantiation of "void thrust::detail::contiguous_storage<T, Alloc>::default_construct_n(thrust::detail::contiguous_storage<T, Alloc>::iterator, thrust::detail::contiguous_storage<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 877 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::append(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 321 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::resize(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 7 of "bug.cc"

1 error detected in the compilation of "bug.cc".

jeffhammond avatar Apr 20 '22 11:04 jeffhammond

You're compiling bug.cc. That means C++ mode, CUDA mode isn't enabled. Do -cuda or name the file bug.cu.

brycelelbach avatar Apr 20 '22 17:04 brycelelbach

Although... this raises the question - why would this be broken for the default backend?

brycelelbach avatar Apr 20 '22 17:04 brycelelbach

why would this be broken for the default backend?

By default, Thrust selects HOST=CPP and DEVICE=CUDA. This can be changed by explicitly defining the THRUST_HOST_SYSTEM and THRUST_DEVICE_SYSTEM macros.

Perhaps nvc++ should set THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP when CUDA and multicore are disabled. @dkolsen-pgi does this sound reasonable to you?

alliepiper avatar Apr 22 '22 16:04 alliepiper

NVC++ shouldn't set THRUST_DEVICE_SYSTEM or related macros when neither CUDA nor stdpar modes are enabled. When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

dkolsen-pgi avatar Apr 22 '22 16:04 dkolsen-pgi

Allison - regardless of how Thrust got into that mode, universal_vector should work with all backends, so this is probably a bug of some sort.

On Fri, Apr 22, 2022 at 12:32 PM David Olsen @.***> wrote:

NVC++ shouldn't set THRUST_DEVICE_SYSTEM or related macros when neither CUDA nor stdpar modes are enabled. When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/thrust/issues/1663#issuecomment-1106678118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADBG4QSP6ON4WTYER4OZCLVGLIB3ANCNFSM5T3WIMPQ . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Bryce Adelstein Lelbach aka wash (he/him/his) US Programming Language Standards (PL22) Chair ISO C++ Library Evolution Chair HPC Programming Models Architect @ NVIDIA

brycelelbach avatar Apr 22 '22 16:04 brycelelbach

When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

This is exactly how it works with other C++ compilers. Trying to build the above reproducer with g++ fails in the exact same way, unless the device system is explicitly set to CPP. The only way to fix that on our side is to change the default system, which would be very disruptive.

universal_vector should work with all backends

universal_vector is tested on all combinations of host/device backends and there haven't been any issues. The problem in this case is that the default CUDA backend is being used without compiler support for CUDA.

alliepiper avatar Apr 22 '22 16:04 alliepiper

I was conflating the nvc++ stdpar usage of Thrust with the actual issue here, which is trying to use Thrust functionality in user code when building with nvc++. I agree that nvc++ shouldn't be modifying any Thrust configs in this case, as it's the user's responsibility to configure this.

@jeffhammond There are two ways to fix this, depending on what you need:

  1. As Bryce mentioned, enabling CUDA mode will fix this if your goal is to use universal_vector's memory on a CUDA device.
  2. If your goal is to not target CUDA but still use universal_vector, you'll need to explicitly set THRUST_DEVICE_SYSTEM to use CPP, OMP, or TBB instead of CUDA.

Let me know if there's anything I missed or if we can close this.

alliepiper avatar Apr 22 '22 17:04 alliepiper

Why isn't not setting CUDA mode sufficient to enable a non-CUDA back end, or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

jeffhammond avatar Apr 22 '22 17:04 jeffhammond

Why isn't not setting CUDA mode sufficient to enable a non-CUDA backend?

It's up to the user to pick the system that they want to use -- we can't predict whether they want to use serial, OpenMP, or TBB for non-CUDA builds.

or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

It does emit a diagnostic stating that functionality is missing for a currently selected system:

static assertion failed with "unimplemented for this system"

I agree that this could be improved. I'll update this issue to reflect that we need a better diagnostic in this case.

alliepiper avatar Apr 22 '22 17:04 alliepiper