MaBLAS.jl icon indicating copy to clipboard operation
MaBLAS.jl copied to clipboard

Autotuning

Open YingboMa opened this issue 5 years ago • 0 comments

Please don't be scared by the title, and think it's going to take a few days to do :-). It should be done in less than 10 minutes. Here is the plan @chriselrod and I came up with.

  1. search for a good kernel size
  2. compute the cache size with an analytical model
  3. search for a good packing strategy

[1] can be done by directly calling the packing=(Val(true), Val(true)) macro kernel with different micro_ms and micro_ns, and benchmark the macro kernel on 400 x 400 and 397 x 397 sized DGEMM (all other types can be handled by just rescaling micro_m).

[2] can be done by some formulae depend on the cache property.

[3] can be done efficiently with bisection, assuming there is one and only one crossing.

The autotuning is off by default, and one can enable it with

ENV["AUTOTUNE_MABLAS"] = true
] build MaBLAS

YingboMa avatar May 16 '20 22:05 YingboMa