tiny-cuda-nn
tiny-cuda-nn copied to clipboard
Insufficient shared memory available on the GPU when using python binding on A100
Hi there,
Thanks for releasing such an amazing code. I tried to replace several MLPs in my code with Fully-fused-MLP using tcnn python binding. I set the number of neurons as 64 and using A100. I compiled the python binding following the instructions on A100. However, I got an error:
FullyFusedMLP: insufficient shared memory available on the GPU. Reduce n_neurons or use CutlassMLP (better compatibility but slower) instead.
I think A100 should have a big shared memory and the layers or number of neurons of the Fully-fused-MLP are not big. Am I missing something when compiling?
Thanks!