thrust
thrust copied to clipboard
reduce_by_key results are non-deterministic for floats
This is a follow up question from https://github.com/NVIDIA/thrust/issues/1587
@allisonvacanti @senior-zero
Unlike the thrust::reduce(), the thrust::reduce_by_key() results are also non-deterministic for floats, am I right?
From my limited testing somehow I got run-to-run result from the below code with CUDA 11.6 SDK. BTW, you need to run the program multiple times and will occasionally see the error. I also tested it with cub::DeviceReduce::ReduceByKey (with 11.6 SDK), and it doesn't work like the documentation mentioned which provides "run-to-run" determinism.
Note: this issue is mainly for thrust, there is a similar issue for cub (https://github.com/NVIDIA/cub/issues/441)
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/fill.h>
#include <thrust/sequence.h>
#include <thrust/reduce.h>
#include <iostream>
int main() {
auto const numElements = 250000;
thrust::device_vector<double> data(numElements, 0.1);
thrust::device_vector<double> keys(numElements, 1);
thrust::device_vector<double> keys_out1(numElements);
thrust::device_vector<double> keys_out2(numElements);
thrust::device_vector<double> out1(numElements);
thrust::device_vector<double> out2(numElements);
thrust::host_vector<double> hostOut1(numElements);
thrust::host_vector<double> hostOut2(numElements);
size_t num1 = thrust::distance(keys_out1.begin(),
thrust::reduce_by_key(keys.begin(), keys.end(), data.begin(), keys_out1.begin(), out1.begin()).first);
size_t num2 = thrust::distance(keys_out2.begin(),
thrust::reduce_by_key(keys.begin(), keys.end(), data.begin(), keys_out2.begin(), out2.begin()).first);
assert(num1 == num2);
std::cout << "num = " << num1 << std::endl;
// copy all of out1 and out2 to the host
thrust::copy(out1.begin(), out1.begin()+num1, hostOut1.begin());
thrust::copy(out2.begin(), out2.begin()+num2, hostOut2.begin());
// Check the outputs are exactly the same
for(int i = 0; i < num1; i++) {
if (hostOut1[i] != hostOut2[i]) {
std::cout << "Element "<< i << " is not equal" << std::endl;
}
}
return 0;
}
It looks like we'll need to update the docs here, similar to NVIDIA/thrust#1587. Thanks for pointing this out!