KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

On CPU always use `NoDynamicCheck()`, just finish the last partial workgroup with `DynamicCheck()`

Open rafaqz opened this issue 1 year ago • 1 comments

Given that DynamicCheck() breaks SIMD this can be an order of magnitude faster for some inexpensive tasks.

I'll write up a better MWE, but this is the scale of it - a single threaded game of life in DynamicGrids.jl (basically summing a 3x3 window over Bool) is 2x faster than an 8 core KernelAbstractions.jl sim pretty much just from DynamicCheck():

julia> using DynamicGrids, BenchmarkTools

julia> init = rand(Bool, 1000, 1000);

julia> output = ResultOutput(init; tspan=1:200);

julia> @btime sim!($output, Life(); proc=SingleCPU());
  338.058 ms (6459 allocations: 3.25 MiB)

julia> @btime sim!($output, Life(); proc=CPUGPU());
  652.198 ms (18401 allocations: 4.63 MiB)

rafaqz avatar Jan 07 '24 14:01 rafaqz

It seems DynamicCheck is only half the problem - it helps a lot removing it, but something else is also blocking the compiler constant propagating size information (its like a sized array) from the type through the KernelAbstractions kernel that it can see in the single threaded version.

I will have to fix it to find out what the problem is, so will probably submit a PR sometime.

rafaqz avatar Jan 08 '24 22:01 rafaqz