XNNPACK
XNNPACK copied to clipboard
enable AVX_VNNI_INT8 instruction for qs8-qc8w-gemm/igemm
The new AVX_VNNI_INT8
instruction can avoid XOR operation in qs8-qc82 gemm/igemm kernel, resulting ~3% performance improvement on mobilenet v1 and v2 int8 models.
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------
qs8_qc8w_gemm_5x8c8__avxvnni_prfm/mobilenet_v1/real_time 3356 us 3305 us 208 <-- orig
qs8_qc8w_gemm_5x8c8__avxvnniint8_prfm/mobilenet_v1/real_time 3249 us 3255 us 216 <-- avxvnniint8
-------------------------------------------------------------------------------------------------------
qs8_qc8w_gemm_5x8c8__avxvnni_prfm/mobilenet_v2/real_time 2783 us 2739 us 251 <-- orig
qs8_qc8w_gemm_5x8c8__avxvnniint8_prfm/mobilenet_v2/real_time 2674 us 2684 us 262 <-- avxvnniint8