OpenBLAS
OpenBLAS copied to clipboard
Add ASIMD Small GEMM kernels
These are experiments to see whether or not we can improve performance a bit on 128-bit SVE cores by using ASIMD instead.
These are probably helpful for #2712 even if they did not appear to result in any speedup for the mystery Graviton4 workload