TensorComprehensions
TensorComprehensions copied to clipboard
[Cuda Codegen] Emit launch bounds
Cuda functions can be annotated with launch bounds, that is the maximum number of threads per block (the minimum blocks per multiprocessor can also be specified). This information is used by nvrtc/nvcc during register allocation (and probably other phases as well).
did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?
Fixed.
On Tue, Jun 19, 2018 at 08:09:18AM -0700, ftynse wrote:
did you check what happens if somebody manually maps to
.mapToThreads(32,0,0)?
Does that make any sense? Surely, the kernel is not going to run at all in that case, so why bother with special cases for this situation?
skimo
Oh, this test is failing for me as well. However, if I dump the cuda and compile it with nvcc, then I see no error.