rocPRIM
rocPRIM copied to clipboard
Fix bug in device partition for large problem sizes
Previously, a check was performed for the last block of each launch which should have been performed only at the last block across all launches. This gave unintended effects, so the redundant checks are removed. This change has no measurable effects on performance. The bug could be reproduced with sizes 30064767271
and 17179866528
in the LargeInputPartition
and LargeInputPartitionThreeWay
tests.
@stanleytsang-amd I can reproduce the bug reliably on gfx906
and gfx908
when letting get_large_sizes
(not get_sizes
) return size 30064767271
in test/rocprim/test_device_partition.cpp
.
Rebased the PR.