TensorComprehensions icon indicating copy to clipboard operation
TensorComprehensions copied to clipboard

[Cuda Codegen] Emit launch bounds

Open thetheodor opened this issue 7 years ago • 4 comments

Cuda functions can be annotated with launch bounds, that is the maximum number of threads per block (the minimum blocks per multiprocessor can also be specified). This information is used by nvrtc/nvcc during register allocation (and probably other phases as well).

thetheodor avatar Jun 19 '18 12:06 thetheodor

did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?

ftynse avatar Jun 19 '18 15:06 ftynse

Fixed.

thetheodor avatar Jun 20 '18 07:06 thetheodor

On Tue, Jun 19, 2018 at 08:09:18AM -0700, ftynse wrote:

did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?

Does that make any sense? Surely, the kernel is not going to run at all in that case, so why bother with special cases for this situation?

skimo

skimo-openhub avatar Jun 20 '18 07:06 skimo-openhub

Oh, this test is failing for me as well. However, if I dump the cuda and compile it with nvcc, then I see no error.

thetheodor avatar Jun 20 '18 07:06 thetheodor