cuda-api-wrappers icon indicating copy to clipboard operation
cuda-api-wrappers copied to clipboard

std::is_trivially_copy_constructible requirement for kernel parameters still too strong

Open pauleonix opened this issue 2 years ago • 6 comments

Even the most basic "fancy iterator" from Thrust ~~thrust::constant_iterator~~ thrust::counting_iterator doesn't fulfill the requirement making algorithms written using cuda-api-wrappers for launching kernels less flexible.

constant_iterator constructor ```cuda __host__ __device__ constant_iterator(constant_iterator const &rhs) : super_t(rhs.base()), m_value(rhs.m_value) {} ```
    __host__ __device__
    counting_iterator(counting_iterator const &rhs):super_t(rhs.base()){}

See ~~Compiler Explorer~~ Compiler Explorer.

pauleonix avatar Oct 13 '23 13:10 pauleonix

To justify your suggestion, please disprove my logic for choosing this trait:

An argument of a kernel does not get "copy-constructed" in GPU global memory; it is copied as raw bytes. Thus, if it cannot be copy-constructed by trivial copying - it cannot be assumed to be a valid constructed object of the relevant type.

eyalroz avatar Oct 13 '23 15:10 eyalroz

Well, in theory ~~constant_iterator<T>~~ counting_iterator<T> is just a T with some iterator-specific accessor functions and therefore trivially copyable if T is. But std::is_trivially_copy_constructible still wont agree to that in practice.

In practice I can't update my research group's codebases written by @codecircuit using cuda-api-wrappers 0.4.x to use >=0.5. And in practice I expect a C++ library for CUDA programming to be composable with Thrust iterators. I don't expect the Thrust team to change their working public API for this static_assert. But if you disagree with that expectation, I can try.

Maybe a preprocessor switch to turn off that static_assert is the right middle ground?

pauleonix avatar Oct 13 '23 16:10 pauleonix

I can't go on what a type should be "in theory". If you write:

struct foo {
    int x, y;
    foo(const foo& f) : x(f.x), y(f.y) { }
}

I can't accept your struct, since the compiler doesn't "know" that this is equivalent to trivial copy construction. And that copy ctor could, in principle, say, flip x and y if we're on the device side and not flip them on the host side.

...

but - now that you mention it, I did a bit of DDG'ing, and I'm wondering whether std::is_trivially_copyable is the trait I need. Does the const_iterator<T> you're interested in satisfy that?

eyalroz avatar Oct 13 '23 17:10 eyalroz

I just chose thrust::const_iterator, because I thought it were the most basic (now that I read/understood more of the source, thrust::counting_iterator is the most basic one made up just of a T in terms of state while constant_iterator needs an additional counter to make distances/ranges work). In fact all Thrust fancy iterators I tested seem to be neither trivially copyable nor trivially copy-constructible. They all seem to encompass an internal counting_iterator so the first question is why that one isn't trivially copyable.

It is a bit hard to decipher the source due to multiple layers of CRTP (counting_iterator -> counting_iterator_base -> iterator_adaptor -> iterator_adaptor_base -> iterator_facade I think), so I'm not sure why it isn't trivially copyable. There is even a

#if THRUST_CPP_DIALECT >= 2011
    counting_iterator & operator=(const counting_iterator &) = default;
#endif

What I know is that all these iterator types are supposed to be safe in regards to passing them as kernel parameters.

pauleonix avatar Oct 13 '23 19:10 pauleonix

What I know is that all these iterator types are supposed to be safe in regards to passing them as kernel parameters.

Let's say I'm willing to accept it on faith that these types are safe for passing. So, shall I hard-code somehow for all the thrust iterator types you tell me about? That doesn't sound right (if it's at all doable). But if I don't hard-code, and instead just remove the restriction - who's to say users won't pass a struct foo from my example?

"I know it should be safe" is not a criterion I can work with... how about you open a bug report against thrust (or the unified CCCL) about this, and post a link here? I'm willing to have a discussion about this with the developers there. Maybe there's an option I'm failing to consider.

eyalroz avatar Oct 13 '23 20:10 eyalroz

https://github.com/NVIDIA/cccl/discussions/562

pauleonix avatar Oct 14 '23 00:10 pauleonix

@pauleonix : Do you think there'll be any traction from the NVIDIA side?

eyalroz avatar Feb 09 '24 22:02 eyalroz

I don't see what's the big deal in just fixing the iterators on their side.

eyalroz avatar Feb 09 '24 22:02 eyalroz

What I know is that they are working on ranges, so I expect some kind of iota_view etc. in the not-too-far future. If this will fix this particular issue or still use the existing iterators or a similar mechanism under the hood, I don't know.

pauleonix avatar Feb 12 '24 10:02 pauleonix

So, we can close this. But - note I am considering strengthening the requirements further: #642 .

eyalroz avatar Apr 19 '24 22:04 eyalroz