OpenBLAS
OpenBLAS copied to clipboard
[WIP]Optimize gemm for small matrix
-
[x] Add basic implementation ( please check aae6af94bbe4f7ad97c417e40fe6a7d4a2798b79 )
-
[ ] Merge sgemm_kernel_direct implementation
-
[ ] Work for DYNAMIC_ARCH
-
[ ] Tune the input matrix size.
-
[ ] Add optimized kernel for architecture.
Probably, you're aware of this project, but I'm leaving it here just for reference: https://github.com/hfp/libxsmm
See https://github.com/xianyi/OpenBLAS/issues/3783