taichi_benchmark
taichi_benchmark copied to clipboard
Parallel scan performance gap between Vulkan and CUDA
Currently we support warp-based parallel scan for Vulkan and CUDA. Lets use this issue to track some performance data:
ENV: RTX3080 with Driver 510. CUDA 11.6.
Number of elements | Vulkan | CUDA |
---|---|---|
131072 | 0.348 ms | 0.160 ms |
65536 | 0.308 ms | 0.111 ms |
32768 | 0.311 ms | 0.114 ms |
16384 | 0.232 ms | 0.082 ms |
8192 | 0.222 ms | 0.075 ms |
4096 | 0.183 ms | 0.075 ms |