CMSIS-NN icon indicating copy to clipboard operation
CMSIS-NN copied to clipboard

Inconsistency of CMSIS-NN Quantization Method(Q-format) with ARM Documentation

Open LEE-SEON-WOO opened this issue 4 months ago • 4 comments

Hello.

I am currently in the process of developing using the Q-Format (Qm.n) for quantization. However, upon reviewing the revision history, I noticed that starting from version 4.1.0, the q-format approach is no longer being followed. My current approach aligns with the methods outlined in the following ARM documentation links:

While TensorFlow Lite for Microcontrollers employs Zero Point and Scale Factor for quantization, which necessitates additional memory and floating-point operations, it appears that Q-format based quantization would be more suitable for Cortex-M processors due to these constraints.

Could you kindly provide a clear explanation for the necessity of this change? The absence of discussion regarding its impact on speed and accuracy has left me somewhat perplexed. Any insight into the rationale behind this decision would be greatly appreciated, as it would aid in understanding the best practices for quantization within the context of TensorFlow Lite for Microcontrollers and CMSIS-NN.

Thank you for your time and consideration.

LEE-SEON-WOO avatar Mar 04 '24 15:03 LEE-SEON-WOO