trove
trove copied to clipboard
Handle warp configurations for 2D and 3D blocks
I noticed that the warp_id calculation within much of the code assumes a 1D thread block. Is this an assumed limitation when using trove?
int warp_id = threadIdx.x & WARP_MASK;
To cater for 2D and 3D blocks this should probably be
int warp_id = ((threadIdx.z*blockDim.y + threadIdx.y)*blockDimx.x + threadIdx.x) & WARP_MASK;
This PR makes the above change, but I'm not sure if its worth going further if the actual algorithm internals depend on a 1D thread block?