cub
cub copied to clipboard
Fix `BlockRadixRankMatchEarlyCounts` or constrain it
Currently, BlockRadixRankMatchEarlyCounts doesn't work in some specific cases (1 << RADIX_BITS) % WARP_THREADS != 0. This use case should be addressed or the structure has to be complemented with a static assert that validates template arguments. To make the structure conforming, we might as well provide BlockDimY and BlockDimZ template parameters.