KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
On CPU always use `NoDynamicCheck()`, just finish the last partial workgroup with `DynamicCheck()`
Given that DynamicCheck() breaks SIMD this can be an order of magnitude faster for some inexpensive tasks.
I'll write up a better MWE, but this is the scale of it - a single threaded game of life in DynamicGrids.jl (basically summing a 3x3 window over Bool) is 2x faster than an 8 core KernelAbstractions.jl sim pretty much just from DynamicCheck():
julia> using DynamicGrids, BenchmarkTools
julia> init = rand(Bool, 1000, 1000);
julia> output = ResultOutput(init; tspan=1:200);
julia> @btime sim!($output, Life(); proc=SingleCPU());
338.058 ms (6459 allocations: 3.25 MiB)
julia> @btime sim!($output, Life(); proc=CPUGPU());
652.198 ms (18401 allocations: 4.63 MiB)
It seems DynamicCheck is only half the problem - it helps a lot removing it, but something else is also blocking the compiler constant propagating size information (its like a sized array) from the type through the KernelAbstractions kernel that it can see in the single threaded version.
I will have to fix it to find out what the problem is, so will probably submit a PR sometime.