tinyengine TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN

Open ellial opened this issue 1 year ago • 1 comments

Hello,

I was measuring the latency on one of TinyEngine's convolutional kernels (convolve_s8_kernel3_stride1_pad1) versus CMSIS-NN's fast convolutional kernel (arm_convolve_HWC_q7_fast). The TinyEngine kernel had a latency of appx. 200000 cycles while the CMSIS kernel had a latency of appx. 130000 cycles.

Is the additional overhead due to the per channel requantization of Tiny Engine? Could you explain why per channel requantization is needed in the kernel?
Have you tried benchmarking the latencies of the frameworks per kernel? If so, could you share the results?

Thank you in advance.

Apr 02 '23 07:04 ellial

Hi @ellial,

convolve_s8_kernel3_stride1_pad1 is a deprecated kernel and not actively used in TinyEngine. For 3x3 convolution kernel, we use https://github.com/mit-han-lab/tinyengine/blob/main/TinyEngine/src/kernels/int_forward_op/convolve_u8_kernel3_inputch3_stride2_pad1.c instead. Please also note for mobilenet-like models, most computation goes to pointwise and depthwise convolutions.

Apr 04 '23 17:04 meenchen

tinyengine tinyengine copied to clipboard

TinyEngine convolutional layer has greater latency than ARM's CMSIS-NN

tinyengine
tinyengine copied to clipboard