[BUG]: stdpar multicore scan tests fail at runtime after commit 1238f287f4f654bff12e4d4a29b7dfa1c40a12fe
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
Thrust
Describe the bug
stdpar multicore scan tests have started failing in correctness after the following commit: 1238f287f4f654bff12e4d4a29b7dfa1c40a12fe For this example:
#include <execution>
#include <algorithm>
#include <numeric>
#include <cassert>
constexpr int N = 10;
int main() {
int* data = new int[N];
int* output = new int[N];
std::iota(data, data + N, 0);
auto r0 = std::exclusive_scan(std::execution::par, data, data + N, output, 10);
assert(r0 == output + N);
delete[] output;
delete[] data;
}
The program fails at runtime with the following assertion error:
int main(): Assertion `r0 == output + N' failed.
How to Reproduce
nvc++ -stdpar=multicore -fast --c++17 scan.cpp
Expected behavior
The program should pass the assertion (r0 == output + N) and complete successfully.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Hi @zkhatami!
Thanks for submitting this issue - the CCCL team has been notified and we'll get back to you as soon as we can! In the mean time, feel free to add any relevant information to this issue.
The fix doesn’t seem to be comprehensive enough, as this case still fails:
#include <algorithm>
#include <execution>
#include <functional>
#include <numeric>
#include <vector>
#include <random>
#include <cassert>
int main() {
std::mt19937 gen(1729);
std::vector<unsigned int> s(1729);
std::generate(s.begin(), s.end(), std::ref(gen));
std::vector<unsigned int> d(1729);
assert(d.end() == inclusive_scan(std::execution::par, s.cbegin(), s.cend(), d.begin(), std::multiplies<>{}));
std::partial_sum(s.cbegin(), s.cend(), s.begin(), std::multiplies<>{});
assert(s == d);
}
nvc++ -stdpar=multicore --c++17 test.cpp ; ./a.out
a.out: parallel_inclusive_scan.cpp:16: int main(): Assertion `s == d' failed.
@bernhardmgruber would you please take a look?
i think here as what i understand it is unsigned int{} which gets initialized to 0 producing an error.
maybe, if we use first element as the init value instead, it would work?
@charan-003 is absolutely right. You cannot assume that accum_t{} is a valid initial value for anything. If the element type is int and the binary operation is multiplication, then using accum_t{} will incorrectly set the entire result to zero. In fact you can't even assume that accum_t{} is well formed. There is no guarantee that the accumulator type has a default constructor.
@charan-003 is also correct that you can fix step 1 by using the first element as the initial value and reducing over the remaining elements. But the accum_t{} in step 2 is still a problem, and that one is harder to fix.
@dkolsen-pgi thanks for the reply.
I made few changes in this PR,I tested both test cases from the issue, and they're passing:
Test 1:
#include <execution>
#include <algorithm>
#include <numeric>
#include <cassert>
constexpr int N = 10;
int main() {
int* data = new int[N];
int* output = new int[N];
std::iota(data, data + N, 0);
auto r0 = std::exclusive_scan(std::execution::par, data, data + N, output, 10);
assert(r0 == output + N);
delete[] output;
delete[] data;
}
Test 2:
#include <algorithm>
#include <execution>
#include <functional>
#include <numeric>
#include <vector>
#include <random>
#include <cassert>
int main() {
std::mt19937 gen(1729);
std::vector<unsigned int> s(1729);
std::generate(s.begin(), s.end(), std::ref(gen));
std::vector<unsigned int> d(1729);
assert(d.end() == inclusive_scan(std::execution::par, s.cbegin(), s.cend(), d.begin(), std::multiplies<>{}));
std::partial_sum(s.cbegin(), s.cend(), s.begin(), std::multiplies<>{});
assert(s == d);
}
Just wondering if there are any other test cases I should consider, or if something else is missing
Just wondering if there are any other test cases I should consider, or if something else is missing
The original change overhauled the implementation of the scan algorithms in the Thrust OpenMP back end. Parallel scan implementations can be very tricky with nasty edge cases. To be confident that everything is correct requires thorough testing: inclusive_scan, transform_inclusive_scan (both of them with and without an initial value), exclusive_scan, and transform_exclusive_scan. Test with a non-commutative binary operator such as string concatenation. Test with a type whose default constructor gives the object a poison value to catch if default values are used anywhere. Test with the input and output being in the same location. (That can be particularly tricky for exclusive_scan, but I think this implementation gets that right.)
(I recently implemented parallel scan algorithms with OpenACC. It was quite challenging. A thorough test suite was extremely helpful.)
@charan-003 I added a few more tests to #6560. Please add further tests as you see fit!
@dkolsen-pgi @bernhardmgruber Thanks for the detailed feedback! I've added tests covering all the cases you mentioned.
Added:
- inclusive_scan and exclusive_scan (with and without init)
- transform_inclusive_scan and transform_exclusive_scan
- Non-additive operations (multiplication operator)
- In-place scans (input == output)
- Custom type with poison default constructor to catch uninitialized values
- Large arrays (10k elements), boundary cases (1024 at threshold), and small arrays (2 elements)
Happy to make more changes as required :))