MaBLAS.jl
MaBLAS.jl copied to clipboard
Autotuning
Please don't be scared by the title, and think it's going to take a few days to do :-). It should be done in less than 10 minutes. Here is the plan @chriselrod and I came up with.
- search for a good kernel size
- compute the cache size with an analytical model
- search for a good packing strategy
[1] can be done by directly calling the packing=(Val(true), Val(true)) macro kernel with different micro_ms and micro_ns, and benchmark the macro kernel on 400 x 400 and 397 x 397 sized DGEMM (all other types can be handled by just rescaling micro_m).
[2] can be done by some formulae depend on the cache property.
[3] can be done efficiently with bisection, assuming there is one and only one crossing.
The autotuning is off by default, and one can enable it with
ENV["AUTOTUNE_MABLAS"] = true
] build MaBLAS