TensorComprehensions [Cuda Codegen] Emit launch bounds

[Cuda Codegen] Emit launch bounds

Open thetheodor opened this issue 7 years ago • 4 comments

Cuda functions can be annotated with launch bounds, that is the maximum number of threads per block (the minimum blocks per multiprocessor can also be specified). This information is used by nvrtc/nvcc during register allocation (and probably other phases as well).

Jun 19 '18 12:06 thetheodor

did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?

Jun 19 '18 15:06 ftynse

Fixed.

Jun 20 '18 07:06 thetheodor

On Tue, Jun 19, 2018 at 08:09:18AM -0700, ftynse wrote:

did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?

Does that make any sense? Surely, the kernel is not going to run at all in that case, so why bother with special cases for this situation?

skimo

Jun 20 '18 07:06 skimo-openhub

Oh, this test is failing for me as well. However, if I dump the cuda and compile it with nvcc, then I see no error.

Jun 20 '18 07:06 thetheodor

TensorComprehensions TensorComprehensions copied to clipboard

[Cuda Codegen] Emit launch bounds

TensorComprehensions
TensorComprehensions copied to clipboard