thrust
thrust copied to clipboard
inclusive_scan passing invalid data to AssociativeOperator
The thrust::inclusive_scan
passes random data not included in the input vector/iterators to the AssociateOperator functor parameter. The following code illustrates the issue.
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/fill.h>
#include <thrust/scan.h>
#include <thrust/count.h>
int main(int argc, const char **argv)
{
int count = argc > 1 ? std::atoi(argv[1]) : 1537; // 1536 and below work ok
thrust::device_vector<int> input(count);
thrust::fill(thrust::device, input.begin(), input.end(), 8);
thrust::device_vector<int> output(count);
thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(),
[] __device__(auto const &lhs, auto const &rhs) {
if (lhs != 8 || rhs != 8)
printf("(%d,%d)\n", lhs, rhs);
return lhs < rhs ? lhs : rhs;
});
auto result = thrust::count_if(thrust::device, output.begin(), output.end(),
[] __device__(auto const &val) { return val != 8; });
cudaStreamSynchronize(0);
printf("result = %d\n", (int)result);
return 0;
}
The count
value must be at least 1537 to see the issue.
The input
vector is initialized to all 8s. When inclusive_scan
is called, the printf
will only fire if a passed in value is not 8
. The result from this call does not appear to be used since the scan result is correct as verified by the count_if
checking for only 8s in the result.
For a more complicated data element, the AssociativeOperator may crash if invalid data is passed. For example, the following structure contains a pointer to device memory. The pointers are all valid in the input vector but the AssociativeOperator functor is sometimes passed random data.
struct mydata
{
int const *a{};
int b{};
mutable int c{};
mydata() = default;
~mydata() = default;
mydata(const mydata &) = default;
mydata(mydata &&) = default;
mydata &operator=(const mydata &) = default;
mydata &operator=(mydata &&) = default;
__host__ __device__ mydata(int *a, int b) : a(a), b(b) {}
__device__ bool operator<(const mydata &rhs) const
{
// accessing '*a' may crash here
return b < rhs.b;
}
};
int main(int argc, const char **argv)
{
int count = argc > 1 ? std::atoi(argv[1]) : 385; // 384 and below are ok
thrust::device_vector<int> adata(1, 7);
auto d_adata = adata.data().get();
thrust::device_vector<mydata> input(count);
thrust::fill(thrust::device, input.begin(), input.end(), mydata{d_adata, 8});
thrust::device_vector<mydata> output(count);
thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(),
[d_adata] __device__(mydata const &lhs, mydata const &rhs) {
if (lhs.a != d_adata || lhs.b != 8 || rhs.a != d_adata || rhs.b != 8)
printf("(%p,%d),(%p,%d)\n", lhs.a, lhs.b, rhs.a, rhs.b);
return lhs < rhs ? lhs : rhs;
});
auto result = thrust::count_if(thrust::device, output.begin(), output.end(),
[d_adata] __device__(mydata const val) { return val.a != d_adata || val.b != 8; });
cudaStreamSynchronize(0);
printf("result = %d\n", (int)result);
return 0;
}
The count
value here must be at least 385 to see the issue.
The printf results for the above example show up nullptrs
being passed but in our larger RAPIDS/libcudf application, some objects passed actually contain invalid, non-null pointers.
The command line compile options are nvcc -std=c++17 --expt-extended-lambda inclscan.cu -o inclscan
This issue was reproduced on nvcc 11.0.221, gcc 9.3.0, ubuntu-1804, with the builtin thrust as well as the latest from the main branch downloaded today. I've also reproduced this on nvcc 11.1.105 and nvcc 11.2.142
For sanity, can you try replacing the __device__
lambda with a function object?
For sanity, can you try replacing the
__device__
lambda with a function object?
Like this?
struct int_fn
{
__device__ int operator()(int lhs, int rhs)
{
if (lhs != 8 || rhs != 8)
printf("(%d,%d)\n", lhs, rhs);
return lhs < rhs ? lhs : rhs;
}
};
...
thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(), int_fn{});
That behaves the same as the lambda.
Yeah, any time I see wonky behavior in Thrust with a device lambda, the first thing I try is using a function object instead. Looks like that didn't apply here though.
Interesting -- I remember fixing this issue in 7e6f33b167b4f3b5eff01eb3ded96b3693f5a162, which was caught by this test. Unfortunately, that only tests a trivially small input and needs a much clearer failure behavior.
We should a new test case similar to @davidwendt's example above that uses DECLARE_VARIABLE_UNITTEST
to cover more sizes/types.