tiny-cuda-nn icon indicating copy to clipboard operation
tiny-cuda-nn copied to clipboard

Insufficient shared memory available on the GPU when using python binding on A100

Open WanqiYuan opened this issue 3 months ago • 1 comments

Hi there,

Thanks for releasing such an amazing code. I tried to replace several MLPs in my code with Fully-fused-MLP using tcnn python binding. I set the number of neurons as 64 and using A100. I compiled the python binding following the instructions on A100. However, I got an error:

FullyFusedMLP: insufficient shared memory available on the GPU. Reduce n_neurons or use CutlassMLP (better compatibility but slower) instead.

I think A100 should have a big shared memory and the layers or number of neurons of the Fully-fused-MLP are not big. Am I missing something when compiling?

Thanks!

WanqiYuan avatar Aug 17 '25 23:08 WanqiYuan