XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

add f32-gemm-5x16-minmax-fma3-broadcast-prfm microkernel

Open Ch3nYuY opened this issue 6 months ago • 2 comments

Prefetched the weights into the L1 cache in xnn_f32_gemm_minmax_ukernel_5x16__fma3_broadcast, resulting in an average performance improvement of over 3% across the MobileNet V1/V2/V3_Large/V3_Small models.

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
f32_gemm_5x16__fma3_broadcast/mobilenet_v1/real_time                 11090 us        10237 us           58     <-- orig
f32_gemm_5x16__fma3_broadcast_prfm/mobilenet_v1/real_time            10049 us        10045 us           70     <-- prefetch
----------------------------------------------------------------------------------------------------------
f32_gemm_5x16__fma3_broadcast/mobilenet_v2/real_time                  6441 us         6366 us          108     <-- orig
f32_gemm_5x16__fma3_broadcast_prfm/mobilenet_v2/real_time             6085 us         6250 us          115     <-- prefetch
----------------------------------------------------------------------------------------------------------
f32_gemm_5x16__fma3_broadcast/mobilenet_v3_large/real_time            5761 us         5682 us          121     <-- orig
f32_gemm_5x16__fma3_broadcast_prfm/mobilenet_v3_large/real_time       5743 us         5777 us          119     <-- prefetch
----------------------------------------------------------------------------------------------------------
f32_gemm_5x16__fma3_broadcast/mobilenet_v3_small/real_time            1861 us         1833 us          375     <-- orig
f32_gemm_5x16__fma3_broadcast_prfm/mobilenet_v3_small/real_time       1826 us         1810 us          397     <-- prefetch

Ch3nYuY avatar Aug 22 '24 09:08 Ch3nYuY