bigwheels
bigwheels copied to clipboard
Fluid simulation - Improve compute utilization
From https://github.com/google/bigwheels/pull/20#discussion_r1041302625
Dispatching a compute shader with large dimensions and using numthreads(1,1,1) is very bad for performance, due to reduced occupancy. Most common graphics cards use a warp size of either 32 (NVIDIA) or 64 (AMD), meaning using numthreads(1,1,1) wastes 31 or 63 threads. The GPU cannot utilize the wasted threads within a warp, meaning you're getting something like at most 1/32 of the GPU performance in this compute-heavy application.
You'd want to use something like numthreads(8,8,1) and then dispatchSize(ppx::float3(ceil(dr.mOutput->GetWidth() / 8), ceil(dr.mOutput->GetHeight() / 8), 1). You also need to modify the shaders to compute XY coordinates slightly differently.