HVM adds dynamic shared mem allocation to cuda kernels

adds dynamic shared mem allocation to cuda kernels

Open kings177 opened this issue 6 months ago • 0 comments

with this hvm.cu should work virtually on every NVIDIA GPU out there (assuming 5.0 CC and above). it dynamically allocates shared memory based on the GPU's capabilities, specifically 3072 less bytes than the max opt-in shared mem available, as some shared arrays use roughly (a little bit less than) that amount of shared mem.

since shared mem allocation needs to be known at compile time, get_shared_mem.cu calculates the available shared mem in build time, which is ran by build.rs that then generates a header file shared_mem_config.h with the correct hex value for the local net.

Closes: #283 and #314 (supposedly)

Aug 16 '24 22:08 kings177

HVM HVM copied to clipboard

adds dynamic shared mem allocation to cuda kernels

HVM
HVM copied to clipboard