OpenBLAS
OpenBLAS copied to clipboard
Adding packed gemm APIs
Hi Xianyi,
Came across the following article. https://www.codeproject.com/Articles/1169319/Reducing-Packing-Overhead-in-Matrix-Matrix-Multipl
This talks about introducing new packed APIs of the following form in MKL.
dest = sgemm_alloc (identifier, m, n, k)
sgemm_pack (identifier, trans, m, n, k, alpha, src, ld, dest)
sgemm_compute (transa, transb, m, n, k, A, lda, B, ldb, beta, C, ldc)
sgemm_free (dest)
This basically avoids the overhead of packing when the same matrix is used multiple times.
Can we think about adding similar APIs in OpenBLAS ?
Thanks
how does openblas know that arguments are immutable between calls? This assumption should be made outside BLAS, e.g using memcache(d) and similar things.
Reading further - OpenBLAS does not make a copy of input matrices in 'native' format for kernels, avtual kernels work on matrices as they are laid out in RAM. It is not clearly said what is being cached - could be that _COPY result in case of INC_ <> 1 ? Still there is better control over it outside BLAS.
OpenBLAS does not make a copy of input matrices in 'native' format for kernels, actual kernels work on matrices as they are laid out in RAM. Request you to go through TCOPY, NCOPY kernel code and how they are used in the GEMM kernel code.
The idea here is to avoid the repeated calls to TCOPY/NCOPY kernels if the same matrix is being used repeatedly. Of course, we should not do this during the standard BLAS gemm call.
Instead the user must be aware of the usage of the matrix and should explicitly call the new APIs gemm_alloc, gemm_pack, gemm_compute and gemm_free as required.
Also, please note that the article is written by Kazushige Goto.
@ashwinyes , I think OpenBLAS can add these APIs.
is openblas had (sgemm_pack sgemm_compute .. )these pack apis now?
Unfortunately not implemented yet (which is why this issue is still open).
Is it possible to make this high priority as sgemm dominates DNN model inference?