gemm
gemm copied to clipboard
Provide benchmark with throughput units (GFlops/s TFlops/s)
Hello fellow gemm optimizer enthusiast,
It would be extremely useful to provide benchmark utilities, ideally in GFlop/s TFlop/s to compare with other frameworks, compare with the CPU peak theoretical throughput and also linpack.
The formula for MxK multiplied by KxN matrices is:
- total required operations:
M*K*N*2
2 for 1mul and 1add - divided by time taken
Additionally you might want to check the required data to derive arithmetic intensity for the roofline model:
- required data:
M*K+K*N
And finally you might also want to check your theoretical peak like: https://github.com/mratsim/weave/blob/b6255af/benchmarks/matmul_gemm_blas/gemm_bench_config.nim#L5-L18
const
CpuGhz = 3.5 # i9-9980XE OC All turbo 4.1GHz (AVX2 4.0GHz, AVX512 3.5GHz)
NumCpuCores = 18
VectorWidth = 16 # 8 float32 for AVX2, 16 for AVX512
InstrCycle = 2 # How many instructions per cycle, (2xFMAs or 1xFMA for example)
FlopInstr = 2 # How many FLOP per instr (FMAs = 1 add + 1 mul)
TheoSerialPeak* = CpuGhz * VectorWidth * InstrCycle * FlopInstr
TheoThreadedPeak* = TheoSerialPeak * NumCpuCores
FYI, you might be interested in my own research in cache utilization tuning, though skimming a bit I see that you tuned at the cache associativity-level while I used some heuristics:
- https://github.com/bluss/matrixmultiply/issues/34#issuecomment-445412450
Benchmarks in my own implementation+OpenMP and OpenBLAS/MKL and MKL-DNN (Latest oneDNN was too entangled to extract the relevant GEMM primitives):
- https://github.com/mratsim/laser
https://github.com/mratsim/laser/blob/d310294/benchmarks/gemm/gemm_bench_float32.nim#L374
Nim must be installed, and OpenBLAS or MKL and then (the submodule will download MKL-DNN)
git clone https://github.com/mratsim/laser cd laser git submodule init nim cpp -r -d:danger -d:openmp --outdir:build benchmarks/gemm/gemm_bench_float32.nim
Benchmarks with my own multithreading runtime (instead of OpenMP)
- https://github.com/mratsim/weave
https://github.com/mratsim/weave/blob/b6255af/benchmarks/matmul_gemm_blas/all_gemm.nim
Nim must be installed, and OpenBLAS or MKL and then (the submodule will download MKL-DNN)
If using Intel MKL, library path can be customized here https://github.com/mratsim/weave/blob/b6255af/benchmarks/matmul_gemm_blas/all_gemm.nimgit clone https://github.com/mratsim/weave cd weave nim c -r -d:danger -threads:on --outdir:build benchmarks/matmul_gemm_blas/all_gemm.nim
thanks for the suggestion. I'll set up something for that soon