blis
blis copied to clipboard
Add i?amin() BLAS functions in compatibility layer
I was able to find the corresponding i?amax() functions but not these.
According to the netlib documentation for level-1 BLAS, the i?amin()
routines are not defined.
I also checked the source code for the latest version of the reference BLAS (3.8.0) and confirmed that the routines are not present.
We are trying to port our code from MKL to BLIS, so I thought compatibility layer would make BLIS a drop-in replacement for MKL. Didn't know that functions such as i?amin are MKL extensions.
If you don't want to add unofficial extensions to the compatibility layer that's fine. But maybe there could be a doc somewhere (forgive me if there is and I haven't found it) helping these kinds of ports, e.g. by pointing the corresponding BLIS functions (+ param values) for some popular unofficial extensions? This would make much easier switching from MKL to BLIS (I think I also didn't find dgemmt and a couple other functions in compatibility layer).
I would think it would take very little to modify i?amax to i?min. I don’t think there is an i?amin equivalent in the BLIS type or object based interface (yet).
A question is whether you have a user who needs this. If you want all of MKL extended functionality (beyond BLAS) in BLIS, this could take quite a bit.
We are trying to port our code from MKL to BLIS, so I thought compatibility layer would make BLIS a drop-in replacement for MKL.
Yes, sorry for the inconvenience. For now, BLIS only provides a BLAS compatibility layer.
Didn't know that functions such as i?amin are MKL extensions.
That makes two of us! I don't use MKL except as a BLAS replacement, and only for benchmarking purposes, so I would be among the last to know what is an MKL extension. i?amin
would be relatively easy to implement in BLIS. But then I would have to decide whether to provide the corresponding "BLAS" interfaces. I would probably prefer to start a new compatibility layer and place them there, but otherwise they would look and feel like BLAS routines.
If you don't want to add unofficial extensions to the compatibility layer that's fine.
I am undecided. Let me consult with my colleagues.
But maybe there could be a doc somewhere (forgive me if there is and I haven't found it) helping these kinds of ports, e.g. by pointing the corresponding BLIS functions (+ param values) for some popular unofficial extensions?
This would require us to know what those popular unofficial extensions are in the first place! :)
This would make much easier switching from MKL to BLIS (I think I also didn't find dgemmt and a couple other functions in compatibility layer).
I'm unfamiliar with dgemmt
. A quick Google search suggests this is like gemm
, but it only updates the lower or upper triangle of C. One or two people have mentioned this in passing to me over the years. This would be relatively straightforward to implement in BLIS.
As Robert mentions, we would probably only initially provide support for operations that were requested since MKL likely has a ton of extra-BLAS routines.
I'm also not the end user for these functions, just the engineer trying to replace MKL BLAS with BLIS in our code base, so I'm not sure about their usage or so.
This would require us to know what those popular unofficial extensions are in the first place! :)
Yeap, defining "popular" is not easy.
I mostly had in my mind a request/issue-driven approach, one step at a time.
For example, if i?amin
and dgemmt
are trivial/easy, then why not add them on the existing or new compatibility layer.
And in the long term this will lower the barrier for new users to switch from MKL BLAS to BLIS.
I am interested in helping implement MKL extensions in BLIS to ensure compatibility. It is ideal if there is a GitHub issue for each function or collection of related functions.
DGEMMT is in both MKL and ESSL. MKL describes it as "Computes a matrix-matrix product with general matrices but updates only the upper or lower triangular part of the result matrix." A good implementation of this will take some work.
- https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/blas-and-sparse-blas-routines/blas-like-extensions/gemmt.html
- https://www.ibm.com/support/knowledgecenter/SSFHY8_6.2/reference/am5gr_dgemmt.html
Correct me if I'm wrong, but isn't gemmt
just syr2k
with some stuff deleted?
It is important to understand where gemmt comes from.
Let’s say we want to compute C = A B A^T + C, where C and B are symmetric.
Because the operation is not part of the BLAS, what people do is to compute the intermediate result
D = ( A B)
And then they update C + D A^T + C, updating only half of C.
Obviously, the operation C = A B A^T + C should instead be supported by the BLAS, and if that is done, then the implementation will be more efficient (since it can be built from the BLIS building blocks) and will not require an intermediate matrix.
So, the RIGHT way to solve this is too see where gemmt is used, to make a list of these, to make this a set of honors projects for undergraduates, and to get rid of gemmt as an interface.
@rvdg Yes, and that is an operation like that I would very much like to see as well. But from an implementation standpoint, assuming gemmt
as a given, starting from syr2k
and whacking some stuff off seems like the same thing and pretty easy to do.
Field,
Don’t we have that operation in libflame? (A B A^T). Or somewhere?
starting from
syr2k
and whacking some stuff off seems like the same thing and pretty easy to do.
Agreed. This is how I would approach it. the syrk
macrokernel is oblivious to the contents of A and B, so 98% of the work for gemmt
has already been done as part of supporting syrk
and syr2k
.
But let's all keep in mind that proper support in BLIS for gemmt
, while straightforward, will be more work than it looks. There's the implementation, the object and typed APIs, the compatibility API, potentially other implementation details that nobody likes (nor should have to) talk about often, testsuite support, test driver support, API documentation... just to name the ones I can think of. And because of this, I should probably be the one to take the lead.
Don’t we have that operation in libflame? (A B A^T). Or somewhere?
Not that I'm aware of.
I am explicitly linking in Paolo to this conversation, since their Lineus (spelling) work dovetails with this.
Paolo pointed me to this discussion because Linnea was mentioned.
If one wants to compute nothing but A B A^T, then a dedicated kernel for this operation might be more useful than gemmt
. However, there is a scenario where gemmt
provides some additional benefits: If A B or B A^T shows up somewhere else in an expression, then gemmt
allows to save work by computing A B or B A^T only once. This is not possible with a kernel for A B A^T that does not expose the intermediate matrix. In the set of application problems we use for our experiments, we have cases like this.
If you decide to implement a kernel for A B A^T, I think it would be good to implement some variants for different properties of B: In our experiments, we have expressions where B is diagonal. While I'm currently not aware of such expressions, B could also be banded.
Correct me if I'm wrong, but isn't
gemmt
justsyr2k
with some stuff deleted?
I would even say that syr2k
and gemmt
are the same (except for alpha):
A B A^T = 1/2 A B A^T + 1/2 A B A^T
= 1/2 A (A B)^T + 1/2 (A B) A^T
= 1/2 A M^T + 1/2 M A^T with M = A B
In my experience with actual applications, the expression A B A^T is by far the one that shows up the most (together with A B^-1 A^T). One should distinguish two scenarios (contexts): 1) the expression appears as part of an assignment such as the one Robert wrote down: C := A B A^T; 2) the expression is part of an inversion M := (... A B A^T ...)^-1. Obviously the two cases have to be treated differently, as one does not want to actually compute M. As for case 1, everything depends on the structure and properties of B. Let's stay within the symmetric case. Very very often B is diagonal. I have also seen it be tridiagonal, banded, block-diagonal, and full. Basically, it can be anything. In all cases, if B is SPD, then the implementation is obvious (via Cholesky). If it's not, one could consider the L D L' factorization, but it's not as efficient as one would like, hence the syr2k trick discussed here. However, the syr2k solution does not work with A B^-1 A^T, while the L D L^T one does. All in all, it's easy to see that this is a moster of an operation in terms of different cases and different implementations.
In all cases, if B is SPD, then the implementation is obvious (via Cholesky).
Good point. That made me wonder how an implementation with Cholesky compares to one with gemmt
if there are additional occurrences of A B or B A^T: If I'm not mistaken, there could be cases where the implementation with gemmt
requires fewer FLOPs.
Is there a BLIS equivalent to tpsv
?
It seems from bla_tpsv.c
that it is not implemented in compatibility layer, and I can see to find it in BLIS documentation.
@gmargari could you create a separate issue for this?
Yes of course, I thought of asking first in case it's somewhere I missed.