tiny-cuda-nn
tiny-cuda-nn copied to clipboard
Supported GPUs unclear
I have tried using tiny-cuda-nn with n_neurons=128 on multiple GPUs with the following results (tiny-cuda-nn is compiled separately on the different systems with appropriate values for TCNN_MIN_GPU_ARCH).
[GPU: FullyFusedMLP works, CutlassMLP works]
- RTX 3090: Yes, Yes
- RTX 3070M: Yes, Yes
- RTX 2070 SUPER: Produces zeros as output, crash
- GTX 1060M: Not supported (as expected), crash
The documentations says:
The fully fused MLP component of this framework requires a very large amount
of shared memory in its default configuration.
It will likely only work on an RTX 3090, an RTX 2080 Ti, or higher-end GPUs.
Why is this the case? I looked into the CUDA documentation, and it seems like the maximum amount of shared memory per SM/thread block is dependent on the compute capabilities/architecture of the GPU, and not on the tier level of the card (low-end vs. high-end).