OpenBLAS Element ordering of inner GEMM kernels

Element ordering of inner GEMM kernels

Open jerryz123 opened this issue 6 years ago • 3 comments

I'm working on optimizing the inner GEMM kernels for RISC-V. I'm confused about the way the arrays are arranged once S/DGEMMKERNEL is called. The array ba[] and bb[] arguments seem to be arranged such that ba[] is row major, and bb[] is column major, turning the matrix-multiply into a series of dot products.

Furthermore, elements of ba[] and bb[] are rearranged, such that a 2x2 (or whatever size) block of elements is arranged contiguously. I guess this is to improve locality?

Is there anyway to write the inner kernel such that it receives A[] and B[] in the same arrangement, (both row major or both column major), without element reordering? The RISC-V vector implementation makes it very simple to perform GEMM if the operands are arranged in this manner, since loads and stores of long arrays stored contiguously are optimized for.

Apr 17 '18 23:04 jerryz123

You can redefine anything you find in /common_macro.h

Apr 18 '18 13:04 brada4

I suspect you would need to change driver/level3/level3.c for that, basically add an "#if defined(RISCV)" branch that skips all the rearranging of the arguments. Unfortunately the code is not really documented, the best we have is https://github.com/xianyi/OpenBLAS/wiki/Developer-manual (which just gives a brief overview of code organization and a link to K.Goto's original paper).

Apr 18 '18 21:04 martin-frbg

I see, thanks. I'm starting to realize that much of the code is very optimized for packed-SIMD optimizations. Unfortunately this makes optimizing the package for a vector architecture somewhat cumbersome.

Apr 18 '18 21:04 jerryz123

OpenBLAS OpenBLAS copied to clipboard

Element ordering of inner GEMM kernels

OpenBLAS
OpenBLAS copied to clipboard