Supported GPUs unclear

Open chrismile opened this issue 2 years ago • 0 comments

I have tried using tiny-cuda-nn with n_neurons=128 on multiple GPUs with the following results (tiny-cuda-nn is compiled separately on the different systems with appropriate values for TCNN_MIN_GPU_ARCH).

[GPU: FullyFusedMLP works, CutlassMLP works]

RTX 3090: Yes, Yes
RTX 3070M: Yes, Yes
RTX 2070 SUPER: Produces zeros as output, crash
GTX 1060M: Not supported (as expected), crash

The documentations says:

The fully fused MLP component of this framework requires a very large amount
of shared memory in its default configuration.
It will likely only work on an RTX 3090, an RTX 2080 Ti, or higher-end GPUs.

Why is this the case? I looked into the CUDA documentation, and it seems like the maximum amount of shared memory per SM/thread block is dependent on the compute capabilities/architecture of the GPU, and not on the tier level of the card (low-end vs. high-end).

Oct 08 '23 16:10 chrismile