Why, after using OneDNN, did I find that the GEMM used in the call stack extracted with perf is arm_gemm from ARM Compute instead of BRGEMM from OneDNN?

Open nanzh-19 opened this issue 1 year ago • 0 comments

When I run TF Serving on an x64 machine, I notice that TensorFlow uses brgemm_matmul_t for inference, while on an ARM architecture machine, it uses arm_gemm. How can I also use brgemm_matmul on ARM, as it provides better performance?

Sep 26 '24 12:09 nanzh-19