thrust icon indicating copy to clipboard operation
thrust copied to clipboard

inclusive_scan passing invalid data to AssociativeOperator

Open davidwendt opened this issue 3 years ago • 4 comments

The thrust::inclusive_scan passes random data not included in the input vector/iterators to the AssociateOperator functor parameter. The following code illustrates the issue.

#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/fill.h>
#include <thrust/scan.h>
#include <thrust/count.h>

int main(int argc, const char **argv)
{
    int count = argc > 1 ? std::atoi(argv[1]) : 1537; // 1536 and below work ok
    thrust::device_vector<int> input(count);
    thrust::fill(thrust::device, input.begin(), input.end(), 8);

    thrust::device_vector<int> output(count);
    thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(),
                           [] __device__(auto const &lhs, auto const &rhs) {
                               if (lhs != 8 || rhs != 8)
                                   printf("(%d,%d)\n", lhs, rhs);
                               return lhs < rhs ? lhs : rhs;
                           });

    auto result = thrust::count_if(thrust::device, output.begin(), output.end(),
                                   [] __device__(auto const &val) { return val != 8; });
    cudaStreamSynchronize(0);
    printf("result = %d\n", (int)result);
    return 0;
}

The count value must be at least 1537 to see the issue.

The input vector is initialized to all 8s. When inclusive_scan is called, the printf will only fire if a passed in value is not 8. The result from this call does not appear to be used since the scan result is correct as verified by the count_if checking for only 8s in the result.

For a more complicated data element, the AssociativeOperator may crash if invalid data is passed. For example, the following structure contains a pointer to device memory. The pointers are all valid in the input vector but the AssociativeOperator functor is sometimes passed random data.

struct mydata
{
    int const *a{};
    int b{};
    mutable int c{};

    mydata() = default;
    ~mydata() = default;
    mydata(const mydata &) = default;
    mydata(mydata &&) = default;
    mydata &operator=(const mydata &) = default;
    mydata &operator=(mydata &&) = default;

    __host__ __device__ mydata(int *a, int b) : a(a), b(b) {}

    __device__ bool operator<(const mydata &rhs) const
    {
        // accessing '*a' may crash here
        return b < rhs.b;
    }
};

int main(int argc, const char **argv)
{
    int count = argc > 1 ? std::atoi(argv[1]) : 385; // 384 and below are ok

    thrust::device_vector<int> adata(1, 7);
    auto d_adata = adata.data().get();

    thrust::device_vector<mydata> input(count);
    thrust::fill(thrust::device, input.begin(), input.end(), mydata{d_adata, 8});

    thrust::device_vector<mydata> output(count);
    thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(),
                           [d_adata] __device__(mydata const &lhs, mydata const &rhs) {
                               if (lhs.a != d_adata || lhs.b != 8 || rhs.a != d_adata || rhs.b != 8)
                                   printf("(%p,%d),(%p,%d)\n", lhs.a, lhs.b, rhs.a, rhs.b);
                               return lhs < rhs ? lhs : rhs;
                           });

    auto result = thrust::count_if(thrust::device, output.begin(), output.end(),
                                   [d_adata] __device__(mydata const val) { return val.a != d_adata || val.b != 8; });

    cudaStreamSynchronize(0);
    printf("result = %d\n", (int)result);
    return 0;
}

The count value here must be at least 385 to see the issue.

The printf results for the above example show up nullptrs being passed but in our larger RAPIDS/libcudf application, some objects passed actually contain invalid, non-null pointers.

The command line compile options are nvcc -std=c++17 --expt-extended-lambda inclscan.cu -o inclscan

This issue was reproduced on nvcc 11.0.221, gcc 9.3.0, ubuntu-1804, with the builtin thrust as well as the latest from the main branch downloaded today. I've also reproduced this on nvcc 11.1.105 and nvcc 11.2.142

davidwendt avatar Jul 08 '21 16:07 davidwendt

For sanity, can you try replacing the __device__ lambda with a function object?

jrhemstad avatar Jul 08 '21 16:07 jrhemstad

For sanity, can you try replacing the __device__ lambda with a function object?

Like this?

struct int_fn
{
    __device__ int operator()(int lhs, int rhs)
    {
        if (lhs != 8 || rhs != 8)
            printf("(%d,%d)\n", lhs, rhs);
        return lhs < rhs ? lhs : rhs;
    }
};

...
    thrust::inclusive_scan(thrust::device, input.begin(), input.end(), output.begin(), int_fn{});

That behaves the same as the lambda.

davidwendt avatar Jul 08 '21 17:07 davidwendt

Yeah, any time I see wonky behavior in Thrust with a device lambda, the first thing I try is using a function object instead. Looks like that didn't apply here though.

jrhemstad avatar Jul 08 '21 18:07 jrhemstad

Interesting -- I remember fixing this issue in 7e6f33b167b4f3b5eff01eb3ded96b3693f5a162, which was caught by this test. Unfortunately, that only tests a trivially small input and needs a much clearer failure behavior.

We should a new test case similar to @davidwendt's example above that uses DECLARE_VARIABLE_UNITTEST to cover more sizes/types.

alliepiper avatar Jul 09 '21 14:07 alliepiper