tflite-micro Int8 Inference Speed Drasticly Dropped

Int8 Inference Speed Drasticly Dropped

Open zyw02 opened this issue 7 months ago • 1 comments

I quantized a cnn to an int8 one and measured the average inference speed, the latency is 2x slower than the original fp32 model. Then I used the profile tools and printed out latency per layer. It turns out that ops like conv2d took much more ticks in the int8 context, But shouldn't the int8 model be the faster one? Can anyone help me with this problem?

Jul 20 '24 08:07 zyw02

tflite-micro tflite-micro copied to clipboard

Int8 Inference Speed Drasticly Dropped

tflite-micro
tflite-micro copied to clipboard