cub icon indicating copy to clipboard operation
cub copied to clipboard

WARP_TIME_SLICING isn't supported in ScatterToStripedGuarded and ScatterToStripedFlagged

Open gevtushenko opened this issue 4 years ago • 0 comments

BlockExchange provides template parameter WARP_TIME_SLICING. It reduces the shared memory footprint. Most of the algorithms in the BlockExchange have specializations for different WARP_TIME_SLICING values. But it isn't the case for ScatterToStripedGuarded and ScatterToStripedFlagged. Specifying WARP_TIME_SLICING=true leads to out of boundary accesses in these algorithms, because int item_offset = ranks[ITEM] isn't mapped to a proper indexation. For example, ScatterToBlocked perform this kind of mapping in a specialization for WARP_TIME_SLICING=true:

int item_offset = ranks[ITEM] - SLICE_OFFSET;

By the way, there are no tests for these algorithms.

gevtushenko avatar Jun 02 '21 13:06 gevtushenko