serving
serving copied to clipboard
Why, after using OneDNN, did I find that the GEMM used in the call stack extracted with perf is arm_gemm from ARM Compute instead of BRGEMM from OneDNN?
When I run TF Serving on an x64 machine, I notice that TensorFlow uses brgemm_matmul_t for inference, while on an ARM architecture machine, it uses arm_gemm. How can I also use brgemm_matmul on ARM, as it provides better performance?