Bo Qiao
Bo Qiao
This is an single-buffer implementation of parallel scan (prefix sum). Reference: https://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/scan/doc/sc https://github.com/NVIDIA/cuda-samples/blob/master/Samples/2_Concepts_and_Techniques/shfl_sc This will be useful for certain Taichi implementations such as PBD.
We would like to share our proposal for modernizing Taichi's CMake-based build system. By embracing the target-based approach, we can enforce a good modular design in our code base. This...
Currently we support warp-based parallel [scan](https://github.com/taichi-dev/taichi_benchmark/blob/main/prefix_scan/taichi/scan.py) for Vulkan and CUDA. Lets use this issue to track some performance data: ENV: RTX3080 with Driver 510. CUDA 11.6. | Number of elements...