XNNPACK enable AVX_VNNI_INT8 instruction for qs8-qc8w-gemm/igemm

enable AVX_VNNI_INT8 instruction for qs8-qc8w-gemm/igemm

Open xujuntwt95329 opened this issue 6 months ago • 1 comments

The new AVX_VNNI_INT8 instruction can avoid XOR operation in qs8-qc82 gemm/igemm kernel, resulting ~3% performance improvement on mobilenet v1 and v2 int8 models.

-------------------------------------------------------------------------------------------------------
Benchmark                                                       Time       CPU     Iterations
-------------------------------------------------------------------------------------------------------
qs8_qc8w_gemm_5x8c8__avxvnni_prfm/mobilenet_v1/real_time       3356 us   3305 us      208  <-- orig
qs8_qc8w_gemm_5x8c8__avxvnniint8_prfm/mobilenet_v1/real_time   3249 us   3255 us      216  <-- avxvnniint8
-------------------------------------------------------------------------------------------------------
qs8_qc8w_gemm_5x8c8__avxvnni_prfm/mobilenet_v2/real_time       2783 us   2739 us      251  <-- orig
qs8_qc8w_gemm_5x8c8__avxvnniint8_prfm/mobilenet_v2/real_time   2674 us   2684 us      262  <-- avxvnniint8

Jul 30 '24 07:07 xujuntwt95329

XNNPACK XNNPACK copied to clipboard

enable AVX_VNNI_INT8 instruction for qs8-qc8w-gemm/igemm

XNNPACK
XNNPACK copied to clipboard