OpenBLAS
OpenBLAS copied to clipboard
[WIP] forward GEMM workloads to GEMV when one argument is actually a vector
fixes #4580 and fixes #528
obviously I don't really intend to kick out the recent Loongson patch here - this first draft was thrown together off-grid in an outdated fork
CodSpeed Performance Report
Merging #4708 will not alter performance
Comparing martin-frbg:issue4580 (c2a9b19) with develop (700ea74)
Summary
✅ 16 untouched benchmarks
@martin-frbg , Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.
@martin-frbg , Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.
Yes in principle, but I am not convinced we actually have to transform the storage of A for that. (Note that the rough draft I posted here may not even compile. I need to update it and flesh it out when I have time)
Thanks... I have an equally unfinished newer version lying around somewhere but got caught up in other things. Let me get the fixups for the SCAL fallout out of the way... but if anybody beats me to it on this here topic it's fine of course. (probably need to remove this from the 0.3.28 milestone anyway so that the release does not get delayed all summer)
@martin-frbg This is an important PR since some project cases have a large portion of GEMM in which N=1. In these cases we are spending significant time packing buffer(s) which is not necessarily needed if GEMV was called instead.
I am aware of that, but this has been an important issue for roughly 20 years (i.e. since inception of GotoBLAS), last discussed here sometime in 2015/16 IIRC. We're almost 2 weeks past the tentative release date for 0.3.28, it bundles an excessive number of changes already, and I still need to come up with assembly code fixes for the SCAL issue in a number of architectures (where assembly isn't my strongest skill anyway).
@martin-frbg I took a look into this in #4814
closing as superseded by #4814