libcudacxx
libcudacxx copied to clipboard
[BUG] atomic is not trivially copyable when used inside template class
With libcudacxx 1.6.0, nvcc -std=c++17 atomic_test.cu -arch=sm_70
, following code causes
atomic_test.cu(7): error C2338: device_uvector only supports types that are trivially copyable.
atomic_test.cu(24): note: see reference to class template instantiation 'rmm::device_uvector<cuda::__4::atomic<scan_tile_status,cuda::std::__4::__detail::thread_scope_device>>' being compiled
#include <cuda/atomic>
namespace rmm {
template <typename T>
class device_uvector {
static_assert(std::is_trivially_copyable<T>::value, //Line 7
"device_uvector only supports types that are trivially copyable.");
public:
device_uvector(size_t i) {}
};
}
enum class scan_tile_status : uint8_t {
oob,
invalid,
partial,
inclusive,
};
using type_t = cuda::atomic<scan_tile_status, cuda::thread_scope_device>;
int main() {
static_assert(std::is_trivially_copyable<type_t>::value, "type is not trivially copyable."); //Line 22
rmm::device_uvector<type_t> tile_status(10); // Line 24
}
static_assert in line 22 passed. But static_assert in line 7 failed! Both are same types, only difference is the template type. Same code compiles in Linux.
Details
Windows 10used libcudacxx 1.6.0
nvcc -std=c++17 atomic_test.cu -arch=sm_70 -I C:\Users\user\Documents\cudf\libcudacxx-1.6.0\include
nvcc --version Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0
The difference for here is probably which of the compiler frontends (i.e. nvcc or the host compiler) is instantiating the static assertion.
For the record, atomics are not copyable, and the bug is in the assertion in main, not the one that's failing; I hope that this is just one of the compilers falsely claiming it is copyable from the intrinsics rather than actually making the type copyable...
Oh. Just noticed you said it compiles on Linux, I wonder why that is.
We seem to have lost the deleted copy ctor declaration. That's the real bug here.
deleted copy ctor declaration of cuda atomic? Will it stop working in Linux after the fix?
Yes; see https://en.cppreference.com/w/cpp/atomic/atomic/atomic.
Right now, device_uvector<cuda::atomic...
is used in
https://github.com/rapidsai/cudf/blob/baf8275ffaecb756092dc2c8819602cd2cff9396/cpp/include/cudf/io/text/detail/tile_state.hpp#L64
Added a test to enforce this and fix in #219...
cc @cwharris
I don't really have an upshot for you @karthikeyann. A possible solution is to replace:
rmm::device_uvector<cuda::atomic<scan_tile_status, cuda::thread_scope_device>> tile_status;
with:
rmm::device_uvector<scan_tile_status> tile_status;
using atom_accessor = cuda::atomic_ref<scan_tile_status, cuda::thread_scope_device>;
However, it looks this value you're storing is a uint8_t
, which we don't yet support in atomics. We currently resize them to uint32_t
.
If you resize this to uint32_t
everything would work the same...
enum class scan_tile_status : uint32_t
This is a bit surgical I'm afraid. :(
changing scan_tile_status to 33 bit shouldn't be a major issue, but last I checked (which was a while ago) atomic_ref was WIP. Is that still the case?
I'll take a look at other workarounds. I recall discussing this with Jake during review, but I'll go back and check the comments there before tagging him in the thread.
@wmaxey @griwes Given that the original bug has been fixed I believe we should close this