OpenBLAS Adding packed gemm APIs

Hi Xianyi,

Came across the following article. https://www.codeproject.com/Articles/1169319/Reducing-Packing-Overhead-in-Matrix-Matrix-Multipl

This talks about introducing new packed APIs of the following form in MKL. dest = sgemm_alloc (identifier, m, n, k) sgemm_pack (identifier, trans, m, n, k, alpha, src, ld, dest) sgemm_compute (transa, transb, m, n, k, A, lda, B, ldb, beta, C, ldc) sgemm_free (dest) This basically avoids the overhead of packing when the same matrix is used multiple times.

Can we think about adding similar APIs in OpenBLAS ?

Thanks

Feb 11 '17 03:02 ashwinyes

how does openblas know that arguments are immutable between calls? This assumption should be made outside BLAS, e.g using memcache(d) and similar things.

Feb 11 '17 22:02 brada4

Reading further - OpenBLAS does not make a copy of input matrices in 'native' format for kernels, avtual kernels work on matrices as they are laid out in RAM. It is not clearly said what is being cached - could be that _COPY result in case of INC_ <> 1 ? Still there is better control over it outside BLAS.

Feb 12 '17 20:02 brada4

OpenBLAS does not make a copy of input matrices in 'native' format for kernels, actual kernels work on matrices as they are laid out in RAM. Request you to go through TCOPY, NCOPY kernel code and how they are used in the GEMM kernel code.

The idea here is to avoid the repeated calls to TCOPY/NCOPY kernels if the same matrix is being used repeatedly. Of course, we should not do this during the standard BLAS gemm call.

Instead the user must be aware of the usage of the matrix and should explicitly call the new APIs gemm_alloc, gemm_pack, gemm_compute and gemm_free as required.

Also, please note that the article is written by Kazushige Goto.

Feb 13 '17 05:02 ashwinyes

@ashwinyes , I think OpenBLAS can add these APIs.