libcudacxx
libcudacxx copied to clipboard
CUDA Tuple Structured Binding Declaration Broken
I'm running into issues where cuda::std::tuple
does not seem to support structured binding declarations. Is this a feature that should work but is broken? Is this unsupported for cuda::std::tuple
specifically?
Note that the normal std::tuple
supports this cpp17 feature, even in device code. If I comment out the cuda_tuple_kernel
and its calls in the main()
function, the code compiles without issues.
Code:
#include <cuda/std/tuple>
#include <tuple>
#define CHECK_CUDA(cmd) \
do { \
cudaError_t res = (cmd); \
if (res != cudaSuccess) { \
fprintf(stderr, "CUDA: %s = %d (%s)\n", #cmd, res, \
cudaGetErrorString(res)); \
abort(); \
} \
} while (0)
// This works
__global__ void std_tuple_kernel()
{
std::tuple<bool, float> my_tup = std::make_tuple(true, 1.0f);
printf("Direct access: %d, %f\n", std::get<0>(my_tup), std::get<1>(my_tup));
auto [first, second] = my_tup;
printf("Structured binding: %d, %f\n", first, second);
}
// This fails to compile
__global__ void cuda_tuple_kernel()
{
cuda::std::tuple<bool, float> my_tup = cuda::std::make_tuple(true, 1.0f);
printf("Direct access: %d, %f\n", cuda::std::get<0>(my_tup), cuda::std::get<1>(my_tup));
auto [first, second] = my_tup;
printf("Structured binding: %d, %f\n", first, second);
}
int main()
{
std_tuple_kernel<<<1, 1>>>();
CHECK_CUDA(cudaDeviceSynchronize());
cuda_tuple_kernel<<<1, 1>>>();
CHECK_CUDA(cudaDeviceSynchronize());
}
Compile command:
nvcc tuple_issues.cu --expt-relaxed-constexpr -std=c++17 -gencode=arch=compute_80,code=compute_80
Compile error:
tuple_issues.cu(22): error: cannot bind to non-public member "cuda::std::__4::tuple<_Tp...>::__base_ [with _Tp=<__nv_bool, float>]"
System Info: GPU: A100 nvcc: 11.7.64 g++: 9.4.0 OS: Ubuntu 20 LTS
Hm, confirmed: https://godbolt.org/z/YPdYdaGTW
@wmaxey is this a known problem? It looks like our structured binding tests for tuple
are commented out: https://github.com/NVIDIA/libcudacxx/blob/4b1a1df666f3bac26f60cb06cda051ca6fc9b158/.upstream-tests/test/std/utilities/tuple/tuple.tuple/tuple.helper/tuple_size_structured_bindings.pass.cpp#L49-L67
The comment indicates a possible compiler bug as well.
@miscco you've been looking at tuple
stuff lately, so you may be able to help look into this as well.
Interesting, I've been running into general std::tuple
compile errors in nvcc as well. For example:
template <typename T>
class TestTupleMember
{
public:
// remove constexpr to fail compile on nvcc >= 11.7)
constexpr __host__ __device__ TestTupleMember(T _data)
{
data = _data;
}
T data;
};
__global__ void make_tuple_kernel()
{
auto my_tup = std::make_tuple(TestTupleMember(1), TestTupleMember(2));
auto [first, second] = my_tup;
printf("Structured binding: %d, %d\n", first.data, second.data);
}
The above code fails to compile unless I add a constexpr
in front of the __host__
statement, but only for 11.7 and up. 11.6.2 compiles successfully. I know this might be out of scope for libcudacxx, but maybe it helps track down the compiler issue.
Note that this succeeds if I use std::make_tuple
with a basic type. It's only a class or struct that fails to compile.
Hm, confirmed: https://godbolt.org/z/YPdYdaGTW
@wmaxey is this a known problem? It looks like our structured binding tests for
tuple
are commented out:https://github.com/NVIDIA/libcudacxx/blob/4b1a1df666f3bac26f60cb06cda051ca6fc9b158/.upstream-tests/test/std/utilities/tuple/tuple.tuple/tuple.helper/tuple_size_structured_bindings.pass.cpp#L49-L67
The comment indicates a possible compiler bug as well.
@miscco you've been looking at
tuple
stuff lately, so you may be able to help look into this as well.
So this is a "I should consider gardening" moment.
The issue at hand is that structured bindings only work when the respective tuple machinery is declared in namespace std.
You can have a look at this here https://godbolt.org/z/h438MMfhM