OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Openblas sgemm is slower for small size matrices in aarch64

Open akote123 opened this issue 11 months ago • 16 comments

I have built openblas in graviton3E with make USE_OPENMP=1 NUM_THREADS=256 TARGET=NEOVERSEV1. mkl is built in icelake machine.

I have used openblas sgemm as cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, 1.0, A, K, B, N, 0.0, C, N);

When performance timings are compared with intel mkl for the the smaller size matmuls, aarch64 is slower .

openblasvsmkl

These are the different shapes I have checked and their timings.

akote123 avatar Mar 26 '24 09:03 akote123