OpenBLAS
OpenBLAS copied to clipboard
[WIP] Arm®v9-A architecture SME2 SGEMM kernels
Add implementation of SGEMM based on the Arm®v9-A architecture Scalable Matrix Extension (SME), using the Arm C Language Extensions (ACLE).
Includes addition of a new target, ARMV9SME, for generic SME2 targets. This new target inherits existing ARMV8SVE settings by default. It can only be build using an SME-capable toolchain such as GCC 14 or LLVM 19.
The SME2 kernel performs outer products on panels of A and B, accumulating into 2x2 inner blocks of C via the SME two-dimensional architectural register, ZA.
Note: this is a WIP target. It is functional for SGEMM, and all GEMM tests are passing. Other BLAS3 routines have not been updated to match the larger kernel size, so SYMM/TRMM tests are currently expected to fail in this WIP state.