OpenBLAS Request GEMMT API

Intel MKL provides an additional GEMM variant, GEMMT, that updates only the upper or lower triangular part of the result matrix. This would be a great addition to OpenBLAS.

https://software.intel.com/en-us/node/590047

Apr 28 '16 14:04 edelsohn

You mean - to wrap _symm with 2 transforms?

Apr 28 '16 21:04 brada4

I believe that the difference is more than wrapping SYMM with two transforms, but maybe I am misunderstanding your question.

SYMM is described as C := alpha_A_B + beta*C https://software.intel.com/en-us/node/468488

GEMMT is described as C := alpha_op(A)_op(B) + beta*C https://software.intel.com/en-us/node/590047

with different limitations on A, B, and C. For GEMMT, the upper triangular or lower triangular part of C is overwritten by the respective part of the result.

Apr 30 '16 01:04 edelsohn

I wrote DGEMMT in julia and in no-transform case it became plain DSYMM call... It is really k--ka-s job they did with llvm

May 01 '16 15:05 brada4

This would be nice! :)

May 02 '19 06:05 zerothi

Hi.

I'm curious if anyone's still tuned-in for this request. To reillustrate the problem, GEMMT computes only a part of GEMM. It's different from SYMM in my opinion as neither of the 2 operands A&B has symmetry restrictions. Rather, it's more similar to SYR2K but with only A*transpose(B) term.

GEMMT is beyond the BLAS standard but I guess it's implementation could be very close to SRYK/HERK?

Feb 04 '21 13:02 xrq-phys

Not implemented yet, but not forgotten either. I agree that it does look more like syr2k than symm.

Feb 11 '21 14:02 martin-frbg

Fwiw, MUMPS solver ( http://mumps.enseeiht.fr/doc/userguide_5.4.1.pdf ) would benefit a lot: "We strongly recommend to use this ability if your BLAS library enables it"

Jan 03 '22 20:01 jgillis

I'd like to second that : GEMMT feature's implementation would be great :).

Jan 17 '22 15:01 egaudry

ReLAPACK provides this: https://github.com/HPAC/ReLAPACK/blob/master/src/dgemmt.c

Feb 22 '22 14:02 mohawk2

Thanks for the pointer, I still believe this feature might be offered by the BLAS it-self (as with other BLAS implementations on the market).

Feb 22 '22 14:02 egaudry

Not the reference one: https://github.com/Reference-LAPACK/lapack/search?q=dgemmt shows no hits.

Feb 22 '22 16:02 mohawk2

agreed, however when looking at MKL and BLIS, you can see them supported as it enables extra performance on various workload (MUMPS Direct Sparse Solver for instance); it seems like a low hanging fruit.

Also, I understand why you gave the pointer to ReLAPACK, thanks for sharing it.

Feb 22 '22 16:02 egaudry

Lots of low hanging fruit but never enough pickers (happens in real-world orchards too). ReLAPACK is included in OpenBLAS as a build-time option, but the gemmt there is not built by default (even in the original ReLAPACK source, see its config.h). I have not gotten around to checking if Peise's algorithm there actually works and is efficient.

Feb 22 '22 17:02 martin-frbg

fair point @martin-frbg; I'll stop arguing as I cannot devote time to help :)

Feb 22 '22 21:02 egaudry

Looks like this is the last blocker for 0.3.21, should this have been closed by #3548?

Jul 05 '22 19:07 larsoner

There are comments above comparing gemmt to syr2k. I think it is more similar to syrkx provided by cuBLAS and rocBLAS. See the links below:

https://docs.nvidia.com/cuda/cublas/#cublas-t-syrkx
https://docs.amd.com/bundle/rocBLAS-User-Guide---rocBLAS-documentation/page/API_Reference_Guide_80.html#rocblas-xsyrkx-batched-strided-batched

syrkx allows for op(A)op(B)^T where gemmt allows for opa(A)opb(B). If one thinks in terms of gemm, gemmt allows NN, NT, TN, TT. syrkx only allows NT or TN.

Jan 16 '23 22:01 amcamd

Thanks. GEMMT was added in #3796, to be released with 0.3.22, apparently I neglected to add the backlink to this issue in the PR

Jan 17 '23 09:01 martin-frbg

OpenBLAS OpenBLAS copied to clipboard

Request GEMMT API

OpenBLAS
OpenBLAS copied to clipboard