tflite-micro
tflite-micro copied to clipboard
Int8 Inference Speed Drasticly Dropped
I quantized a cnn to an int8 one and measured the average inference speed, the latency is 2x slower than the original fp32 model. Then I used the profile tools and printed out latency per layer. It turns out that ops like conv2d took much more ticks in the int8 context, But shouldn't the int8 model be the faster one? Can anyone help me with this problem?