Insufficient shared memory available on the GPU when using python binding on A100
Hi there,
Thanks for releasing such an amazing code. I tried to replace several MLPs in my code with Fully-fused-MLP using tcnn python binding. I set the number of neurons as 64 and using A100. I compiled the python binding following the instructions on A100. However, I got an error:
FullyFusedMLP: insufficient shared memory available on the GPU. Reduce n_neurons or use CutlassMLP (better compatibility but slower) instead.
I think A100 should have a big shared memory and the layers or number of neurons of the Fully-fused-MLP are not big. Am I missing something when compiling?
Thanks!
Could you share the full compilation log (delete package and recompile from scratch)? While doing so, ensure that the target compute capability appears as 80 (corresponding to A100) in the log.
Please also run python3 samples/mlp_learning_an_image_pytorch.py --config your-config.json to try isolating the issue outside of your code base. You can adapt data/config.json with your particular MLP configuration to generate your-config.json.
Thanks!