taichi_benchmark icon indicating copy to clipboard operation
taichi_benchmark copied to clipboard

Parallel scan performance gap between Vulkan and CUDA

Open qiao-bo opened this issue 2 years ago • 0 comments

Currently we support warp-based parallel scan for Vulkan and CUDA. Lets use this issue to track some performance data:

ENV: RTX3080 with Driver 510. CUDA 11.6.

Number of elements Vulkan CUDA
131072 0.348 ms 0.160 ms
65536 0.308 ms 0.111 ms
32768 0.311 ms 0.114 ms
16384 0.232 ms 0.082 ms
8192 0.222 ms 0.075 ms
4096 0.183 ms 0.075 ms

qiao-bo avatar Jul 21 '22 06:07 qiao-bo