thrust
thrust copied to clipboard
CUDA-MEMCHECK reports Uninitialized access
Hello, thrust team,
I found CUDA-MEMCHECK reported uninitialized access of thrust calls. I made a simple reproducer attached below. Please take a look. This is with cuda 10.1 on P100.
Thanks
-Ruipeng
#include <thrust/extrema.h>
#define N 64
int main()
{
int *h_x, *d_x, i, k;
h_x = (int *) malloc(N*sizeof(int));
for (i = 0; i < N; i++)
{
h_x[i] = i;
}
cudaMalloc(&d_x, N*sizeof(int));
cudaMemcpy(d_x, h_x, N*sizeof(int), cudaMemcpyHostToDevice);
int *ptr = thrust::max_element( thrust::device, d_x, d_x + N);
cudaMemcpy(&k, ptr, sizeof(int), cudaMemcpyDeviceToHost);
printf("max = %d\n", k);
cudaFree(d_x);
free(h_x);
return 0;
}
Output
li50@ray15:~/workspace/tmp/GPU/thrust$ cuda-memcheck --tool initcheck ./a.out
========= CUDA-MEMCHECK
========= Host API memory access error at host access to 0x100030e00600 of size 16 bytes
========= Uninitialized access at 0x100030e00600 on access by cudaMemcopy source.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/lib64/libcuda.so.1 (cuMemcpyDtoHAsync_v2 + 0x1e4) [0x2ec984]
========= Host Frame:./a.out [0x5357c]
========= Host Frame:./a.out [0x208cc]
========= Host Frame:./a.out [0x787a8]
========= Host Frame:./a.out [0x117f8]
========= Host Frame:./a.out [0x112dc]
========= Host Frame:./a.out [0x10f00]
========= Host Frame:./a.out [0x10608]
========= Host Frame:./a.out [0x10084]
========= Host Frame:./a.out [0x100e0]
========= Host Frame:./a.out [0xe328]
========= Host Frame:./a.out [0xe38c]
========= Host Frame:./a.out [0xcdb0]
========= Host Frame:./a.out [0xc524]
========= Host Frame:./a.out [0xa388]
========= Host Frame:./a.out [0xa314]
========= Host Frame:./a.out [0x9db8]
========= Host Frame:./a.out [0x4e0c]
========= Host Frame:/lib64/libc.so.6 [0x25100]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xc4) [0x252f4]
=========
max = 63
========= ERROR SUMMARY: 1 error
I confirm this behavior on cuda 10.2.
I can confirm this behavior in CUDA 11 w/ a V100. I notice this issue was opened exactly 1 year ago. Has anyone found a fix for this yet? I've tried explicitly syncing the stream I'm using in the policy but I still get the error.
I'm experiencing this w/ the following code:
thrust::device_ptr<value_idx> d_ptr_indptr =
thrust::device_pointer_cast(indptr);
thrust::device_ptr<value_idx> d_ptr_diff =
thrust::device_pointer_cast(diff.data());
ML::thrustAllocatorAdapter alloc(allocator, stream);
thrust::adjacent_difference(thrust::cuda::par(alloc).on(stream),
d_ptr_indptr, d_ptr_indptr + n_rows, d_ptr_diff);
return *(thrust::max_element(thrust::cuda::par(alloc).on(stream), d_ptr_diff,
d_ptr_diff + n_rows));
Just confirmed this behavior on CUDA 10.2 for thrust::min_max
Looks like this fell through the cracks. I'll try to look at this soon.
Likely related to NVIDIA/cub#184.