aimet icon indicating copy to clipboard operation
aimet copied to clipboard

AIMET - Resnet18- 8 bit quantized Model - Need help regarding model memory size and inference time reduction with acceptable accuracy drop

Open solomonmanuelraj opened this issue 1 year ago • 3 comments

Hi team,

I am using your AIMET tool. I read in your developer blog (https://developer.qualcomm.com/blog/exploring-aimet-s-post-training-quantization-methods) the attached statement. "Table 1- Accuracies of FP32 models versus those optimized with AIMET’s CLE and Bias Correction methods - In all three cases, the loss in accuracy (versus the FP32 model) is less than 1%, while model size decreased by four times, from 32-bit to 8-bit." It is not clear how come the blogger says that model decreased by four times . What we need to do to reduce the model from 32 bit to 8-bits. whether AIMET tool has the API/algorithm to do this 32 bit to 8 bit/4 bit conversion. In AIMET model Zoo you are giving lot of W8A8 models ( e.g for Yolox-s). Want to know how you created the less bit models for the equivalent 32 bits model.

Unfortunately your research paper gives only the details regarding model accuracy but it is not giving the memory and inference improvement / degradation. It is unfair to compare the model with one metrics. In real use case we need to compare the model based on inference time, memort size and accuracy . Why are you not considering all the metrics ?

My requirement is i have FP32 model that has more memory size & inference time ( around 100 MB and 600ms inference time). I need to reduce both the memory size and inference time with the acceptable model ready to accept accuracy drop. Need your help how AIMET can be useful to solve my usecase.

thanks. Waiting for your response.

solomonmanuelraj avatar Nov 09 '23 08:11 solomonmanuelraj

@solomonmanuelraj, AIMET provides quantization simulation, which mimic the accuracy of a quantized target. On how to use AIMET, you could refer to the Example code/ tutorials.

To realize the actual reduction in inference and memory time, you will have to take the model to an actual HW.

quic-mangal avatar Nov 09 '23 18:11 quic-mangal

Thanks for the update.

Quantization simulation mimic the accuracy of a quantized target. It is good. Whether it will mimic the inference time and memory size, MAC of a quantized target as well.

If we get accuracy,inference time, memory size and mac it will be wonderful. otherwise for this metrics i need to check it in the target machine. it is a time waste. let me know your comments.

I have gone through your developer post. ( https://developer.qualcomm.com/blog/neural-network-optimization-aimet)

It gave me clear steps and explanation. This diagram and steps needs to be added in your example / tutorial document.

with out this information it will be difficult to measure your tools value add.

solomonmanuelraj avatar Nov 10 '23 07:11 solomonmanuelraj

Mathematically we can predict how much memory reductions and mac reductions we can achieve but inference numbers are hard to say. You could use online HW simulators for getting the inference time.

quic-mangal avatar Nov 10 '23 18:11 quic-mangal