OpenBLAS
                                
                                
                                
                                    OpenBLAS copied to clipboard
                            
                            
                            
                        [WIP]Optimize gemm for small matrix
- 
[x] Add basic implementation ( please check aae6af94bbe4f7ad97c417e40fe6a7d4a2798b79 )
 - 
[ ] Merge sgemm_kernel_direct implementation
 - 
[ ] Work for DYNAMIC_ARCH
 - 
[ ] Tune the input matrix size.
 - 
[ ] Add optimized kernel for architecture.
 
Probably, you're aware of this project, but I'm leaving it here just for reference: https://github.com/hfp/libxsmm
See https://github.com/xianyi/OpenBLAS/issues/3783