blis Add i?amin() BLAS functions in compatibility layer

I was able to find the corresponding i?amax() functions but not these.

Jul 08 '20 14:07 gmargari

According to the netlib documentation for level-1 BLAS, the i?amin() routines are not defined.

I also checked the source code for the latest version of the reference BLAS (3.8.0) and confirmed that the routines are not present.

Jul 08 '20 18:07 fgvanzee

We are trying to port our code from MKL to BLIS, so I thought compatibility layer would make BLIS a drop-in replacement for MKL. Didn't know that functions such as i?amin are MKL extensions.

If you don't want to add unofficial extensions to the compatibility layer that's fine. But maybe there could be a doc somewhere (forgive me if there is and I haven't found it) helping these kinds of ports, e.g. by pointing the corresponding BLIS functions (+ param values) for some popular unofficial extensions? This would make much easier switching from MKL to BLIS (I think I also didn't find dgemmt and a couple other functions in compatibility layer).

Jul 08 '20 19:07 gmargari

I would think it would take very little to modify i?amax to i?min. I don’t think there is an i?amin equivalent in the BLIS type or object based interface (yet).

A question is whether you have a user who needs this. If you want all of MKL extended functionality (beyond BLAS) in BLIS, this could take quite a bit.

Jul 08 '20 19:07 rvdg

We are trying to port our code from MKL to BLIS, so I thought compatibility layer would make BLIS a drop-in replacement for MKL.

Yes, sorry for the inconvenience. For now, BLIS only provides a BLAS compatibility layer.

Didn't know that functions such as i?amin are MKL extensions.

That makes two of us! I don't use MKL except as a BLAS replacement, and only for benchmarking purposes, so I would be among the last to know what is an MKL extension. i?amin would be relatively easy to implement in BLIS. But then I would have to decide whether to provide the corresponding "BLAS" interfaces. I would probably prefer to start a new compatibility layer and place them there, but otherwise they would look and feel like BLAS routines.

If you don't want to add unofficial extensions to the compatibility layer that's fine.

I am undecided. Let me consult with my colleagues.

But maybe there could be a doc somewhere (forgive me if there is and I haven't found it) helping these kinds of ports, e.g. by pointing the corresponding BLIS functions (+ param values) for some popular unofficial extensions?

This would require us to know what those popular unofficial extensions are in the first place! :)

This would make much easier switching from MKL to BLIS (I think I also didn't find dgemmt and a couple other functions in compatibility layer).

I'm unfamiliar with dgemmt. A quick Google search suggests this is like gemm, but it only updates the lower or upper triangle of C. One or two people have mentioned this in passing to me over the years. This would be relatively straightforward to implement in BLIS.

As Robert mentions, we would probably only initially provide support for operations that were requested since MKL likely has a ton of extra-BLAS routines.

Jul 08 '20 20:07 fgvanzee

I'm also not the end user for these functions, just the engineer trying to replace MKL BLAS with BLIS in our code base, so I'm not sure about their usage or so.

This would require us to know what those popular unofficial extensions are in the first place! :)

Yeap, defining "popular" is not easy. I mostly had in my mind a request/issue-driven approach, one step at a time. For example, if i?amin and dgemmt are trivial/easy, then why not add them on the existing or new compatibility layer. And in the long term this will lower the barrier for new users to switch from MKL BLAS to BLIS.

Jul 09 '20 06:07 gmargari

I am interested in helping implement MKL extensions in BLIS to ensure compatibility. It is ideal if there is a GitHub issue for each function or collection of related functions.

DGEMMT is in both MKL and ESSL. MKL describes it as "Computes a matrix-matrix product with general matrices but updates only the upper or lower triangular part of the result matrix." A good implementation of this will take some work.

https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/blas-and-sparse-blas-routines/blas-like-extensions/gemmt.html
https://www.ibm.com/support/knowledgecenter/SSFHY8_6.2/reference/am5gr_dgemmt.html

Jul 09 '20 15:07 jeffhammond

Correct me if I'm wrong, but isn't gemmt just syr2k with some stuff deleted?

Jul 09 '20 15:07 devinamatthews

It is important to understand where gemmt comes from.

Let’s say we want to compute C = A B A^T + C, where C and B are symmetric.

Because the operation is not part of the BLAS, what people do is to compute the intermediate result

D = ( A B)

And then they update C + D A^T + C, updating only half of C.

Obviously, the operation C = A B A^T + C should instead be supported by the BLAS, and if that is done, then the implementation will be more efficient (since it can be built from the BLIS building blocks) and will not require an intermediate matrix.

So, the RIGHT way to solve this is too see where gemmt is used, to make a list of these, to make this a set of honors projects for undergraduates, and to get rid of gemmt as an interface.

Jul 09 '20 16:07 rvdg

@rvdg Yes, and that is an operation like that I would very much like to see as well. But from an implementation standpoint, assuming gemmt as a given, starting from syr2k and whacking some stuff off seems like the same thing and pretty easy to do.

Jul 09 '20 16:07 devinamatthews

Field,

Don’t we have that operation in libflame? (A B A^T). Or somewhere?

Jul 09 '20 16:07 rvdg

starting from syr2k and whacking some stuff off seems like the same thing and pretty easy to do.

Agreed. This is how I would approach it. the syrk macrokernel is oblivious to the contents of A and B, so 98% of the work for gemmt has already been done as part of supporting syrk and syr2k.

But let's all keep in mind that proper support in BLIS for gemmt, while straightforward, will be more work than it looks. There's the implementation, the object and typed APIs, the compatibility API, potentially other implementation details that nobody likes (nor should have to) talk about often, testsuite support, test driver support, API documentation... just to name the ones I can think of. And because of this, I should probably be the one to take the lead.

Don’t we have that operation in libflame? (A B A^T). Or somewhere?

Not that I'm aware of.

Jul 09 '20 16:07 fgvanzee

I am explicitly linking in Paolo to this conversation, since their Lineus (spelling) work dovetails with this.

Jul 09 '20 16:07 rvdg

Paolo pointed me to this discussion because Linnea was mentioned.

If one wants to compute nothing but A B A^T, then a dedicated kernel for this operation might be more useful than gemmt. However, there is a scenario where gemmt provides some additional benefits: If A B or B A^T shows up somewhere else in an expression, then gemmt allows to save work by computing A B or B A^T only once. This is not possible with a kernel for A B A^T that does not expose the intermediate matrix. In the set of application problems we use for our experiments, we have cases like this.

If you decide to implement a kernel for A B A^T, I think it would be good to implement some variants for different properties of B: In our experiments, we have expressions where B is diagonal. While I'm currently not aware of such expressions, B could also be banded.

Correct me if I'm wrong, but isn't gemmt just syr2k with some stuff deleted?

I would even say that syr2k and gemmt are the same (except for alpha):

A B A^T = 1/2 A B A^T + 1/2 A B A^T
        = 1/2 A (A B)^T + 1/2 (A B) A^T
        = 1/2 A M^T + 1/2 M A^T with M = A B

Jul 15 '20 08:07 hbarthels

In my experience with actual applications, the expression A B A^T is by far the one that shows up the most (together with A B^-1 A^T). One should distinguish two scenarios (contexts): 1) the expression appears as part of an assignment such as the one Robert wrote down: C := A B A^T; 2) the expression is part of an inversion M := (... A B A^T ...)^-1. Obviously the two cases have to be treated differently, as one does not want to actually compute M. As for case 1, everything depends on the structure and properties of B. Let's stay within the symmetric case. Very very often B is diagonal. I have also seen it be tridiagonal, banded, block-diagonal, and full. Basically, it can be anything. In all cases, if B is SPD, then the implementation is obvious (via Cholesky). If it's not, one could consider the L D L' factorization, but it's not as efficient as one would like, hence the syr2k trick discussed here. However, the syr2k solution does not work with A B^-1 A^T, while the L D L^T one does. All in all, it's easy to see that this is a moster of an operation in terms of different cases and different implementations.

Jul 15 '20 09:07 pauldj

In all cases, if B is SPD, then the implementation is obvious (via Cholesky).

Good point. That made me wonder how an implementation with Cholesky compares to one with gemmt if there are additional occurrences of A B or B A^T: If I'm not mistaken, there could be cases where the implementation with gemmt requires fewer FLOPs.

Jul 15 '20 10:07 hbarthels

Is there a BLIS equivalent to tpsv? It seems from bla_tpsv.c that it is not implemented in compatibility layer, and I can see to find it in BLIS documentation.

Jul 20 '20 10:07 gmargari

@gmargari could you create a separate issue for this?

Jul 20 '20 14:07 devinamatthews

Yes of course, I thought of asking first in case it's somewhere I missed.

Jul 20 '20 14:07 gmargari

blis blis copied to clipboard

Add i?amin() BLAS functions in compatibility layer

blis
blis copied to clipboard