warp icon indicating copy to clipboard operation
warp copied to clipboard

[REQ] Make __launch_bounds__ configurable via wp.kernel

Open nvtw opened this issue 2 months ago • 1 comments

Description

Expose the launch_bounds cuda feature to warp.

Proposes API: @wp.kernel(enable_backward=False, launch_bounds=(a, b))

See: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#launch-bounds

Context

This gives the user more control over how much registers are allocated for their kernel. It's mainly an optional expert tool for optimizations.

nvtw avatar Oct 23 '25 15:10 nvtw

Turns out that the launch_bounds could be combined with this new cuda 13 feature https://developer.nvidia.com/blog/how-to-improve-cuda-kernel-performance-with-shared-memory-register-spilling/ to get even more control over the registers. Looks like adding a warp call (that emits asm volatile (".pragma \"enable_smem_spilling\";"); under the hood) to enable shared memory register spilling would be doable without too much effort. Or one more option in @wp.kernel would probably work too and then the code generator would insert that asm line as the first statement in the kernel.

nvtw avatar Oct 27 '25 06:10 nvtw

Added in eddb998a01a55e711d692a4a62003f18f238bd31

shi-eric avatar Dec 09 '25 22:12 shi-eric