cuda-api-wrappers
cuda-api-wrappers copied to clipboard
std::is_trivially_copy_constructible requirement for kernel parameters still too strong
Even the most basic "fancy iterator" from Thrust ~~thrust::constant_iterator~~ thrust::counting_iterator doesn't fulfill the requirement making algorithms written using cuda-api-wrappers for launching kernels less flexible.
constant_iterator constructor
```cuda
__host__ __device__
constant_iterator(constant_iterator const &rhs)
: super_t(rhs.base()), m_value(rhs.m_value) {}
```
constant_iterator constructor __host__ __device__
counting_iterator(counting_iterator const &rhs):super_t(rhs.base()){}
See ~~Compiler Explorer~~ Compiler Explorer.
To justify your suggestion, please disprove my logic for choosing this trait:
An argument of a kernel does not get "copy-constructed" in GPU global memory; it is copied as raw bytes. Thus, if it cannot be copy-constructed by trivial copying - it cannot be assumed to be a valid constructed object of the relevant type.
Well, in theory ~~constant_iterator<T>~~ counting_iterator<T> is just a T with some iterator-specific accessor functions and therefore trivially copyable if T is. But std::is_trivially_copy_constructible still wont agree to that in practice.
In practice I can't update my research group's codebases written by @codecircuit using cuda-api-wrappers 0.4.x to use >=0.5. And in practice I expect a C++ library for CUDA programming to be composable with Thrust iterators. I don't expect the Thrust team to change their working public API for this static_assert. But if you disagree with that expectation, I can try.
Maybe a preprocessor switch to turn off that static_assert is the right middle ground?
I can't go on what a type should be "in theory". If you write:
struct foo {
int x, y;
foo(const foo& f) : x(f.x), y(f.y) { }
}
I can't accept your struct, since the compiler doesn't "know" that this is equivalent to trivial copy construction. And that copy ctor could, in principle, say, flip x and y if we're on the device side and not flip them on the host side.
...
but - now that you mention it, I did a bit of DDG'ing, and I'm wondering whether std::is_trivially_copyable is the trait I need. Does the const_iterator<T> you're interested in satisfy that?
I just chose thrust::const_iterator, because I thought it were the most basic (now that I read/understood more of the source, thrust::counting_iterator is the most basic one made up just of a T in terms of state while constant_iterator needs an additional counter to make distances/ranges work). In fact all Thrust fancy iterators I tested seem to be neither trivially copyable nor trivially copy-constructible. They all seem to encompass an internal counting_iterator so the first question is why that one isn't trivially copyable.
It is a bit hard to decipher the source due to multiple layers of CRTP (counting_iterator -> counting_iterator_base -> iterator_adaptor -> iterator_adaptor_base -> iterator_facade I think), so I'm not sure why it isn't trivially copyable. There is even a
#if THRUST_CPP_DIALECT >= 2011
counting_iterator & operator=(const counting_iterator &) = default;
#endif
What I know is that all these iterator types are supposed to be safe in regards to passing them as kernel parameters.
What I know is that all these iterator types are supposed to be safe in regards to passing them as kernel parameters.
Let's say I'm willing to accept it on faith that these types are safe for passing. So, shall I hard-code somehow for all the thrust iterator types you tell me about? That doesn't sound right (if it's at all doable). But if I don't hard-code, and instead just remove the restriction - who's to say users won't pass a struct foo from my example?
"I know it should be safe" is not a criterion I can work with... how about you open a bug report against thrust (or the unified CCCL) about this, and post a link here? I'm willing to have a discussion about this with the developers there. Maybe there's an option I'm failing to consider.
https://github.com/NVIDIA/cccl/discussions/562
@pauleonix : Do you think there'll be any traction from the NVIDIA side?
I don't see what's the big deal in just fixing the iterators on their side.
What I know is that they are working on ranges, so I expect some kind of iota_view etc. in the not-too-far future. If this will fix this particular issue or still use the existing iterators or a similar mechanism under the hood, I don't know.
So, we can close this. But - note I am considering strengthening the requirements further: #642 .