tflite-micro icon indicating copy to clipboard operation
tflite-micro copied to clipboard

Int8 Inference Speed Drasticly Dropped

Open zyw02 opened this issue 7 months ago • 1 comments

I quantized a cnn to an int8 one and measured the average inference speed, the latency is 2x slower than the original fp32 model. Then I used the profile tools and printed out latency per layer. It turns out that ops like conv2d took much more ticks in the int8 context, But shouldn't the int8 model be the faster one? Can anyone help me with this problem?

zyw02 avatar Jul 20 '24 08:07 zyw02