tflite-micro icon indicating copy to clipboard operation
tflite-micro copied to clipboard

Int8 Inference Speed Drasticly Dropped

Open zyw02 opened this issue 1 year ago • 1 comments

I quantized a cnn to an int8 one and measured the average inference speed, the latency is 2x slower than the original fp32 model. Then I used the profile tools and printed out latency per layer. It turns out that ops like conv2d took much more ticks in the int8 context, But shouldn't the int8 model be the faster one? Can anyone help me with this problem?

zyw02 avatar Jul 20 '24 08:07 zyw02

Hey @zyw02 , could you provide more details? eg. Model, hardware, are you using reference kernels or optimized kernels, tflm version, any logs or profiling that you did

ArmRyan avatar Jul 30 '24 08:07 ArmRyan

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] avatar Aug 24 '24 10:08 github-actions[bot]

"This issue is being closed because it has been marked as stale for 5 days with no further activity."

github-actions[bot] avatar Aug 29 '24 10:08 github-actions[bot]