cub
cub copied to clipboard
WARP_TIME_SLICING isn't supported in ScatterToStripedGuarded and ScatterToStripedFlagged
BlockExchange provides template parameter WARP_TIME_SLICING. It reduces the shared memory footprint. Most of the algorithms in the BlockExchange have specializations for different WARP_TIME_SLICING values. But it isn't the case for ScatterToStripedGuarded and ScatterToStripedFlagged. Specifying WARP_TIME_SLICING=true leads to out of boundary accesses in these algorithms, because int item_offset = ranks[ITEM] isn't mapped to a proper indexation. For example, ScatterToBlocked perform this kind of mapping in a specialization for WARP_TIME_SLICING=true:
int item_offset = ranks[ITEM] - SLICE_OFFSET;
By the way, there are no tests for these algorithms.