OpenBLAS
OpenBLAS copied to clipboard
SGEMM performance opportunity on POWER8 VSX
SGEMM performance on POWER8 VSX has some opportunities for improvement.
For cblas_sgemm_googlenet,
M = 192, N = 3136, K = 576 shows the slowest performance and M = 320, N = 196, K = 1440 seems to have the greatest opportunity for improvement
For cblas_sgemm_vggnet,
M = 256, N = 3136, K = 2304 shows the slowest performance and M = 512, N = 196, K = 4680 seems to have the greatest opportunity for improvement
relative to alternate optimized BLAS implementations.