neural-compressor How to evaluate quantised model in Pytorch

How to evaluate quantised model in Pytorch

Open georgesterpu opened this issue 2 years ago • 1 comments

Hi,

The quantisation function in neural_compressor/quantization/fit returns a PyTorchFXModel object, which contains two members fp32_model and model. Could you please let me know what is the correct way of evaluating the quantised model in Pytorch? Following the examples in the documentation, is it correct to evaluate q_model.forward(), or is there any helper function taking the quantised model and the test data as inputs?

I am currently seeing large differences between q_model.fp32_model(inputs) and q_model(inputs) or q_model.model(inputs), and would like to understand if this is the result of an incorrect PTQ configuration or an incorrect evaluation of the quantised model.

Thank you!

Sep 01 '23 10:09 georgesterpu

For INT8 model inference, q_model(inputs) == q_model.model(inputs) I think. The int8 model is q_model.model. You can also use our save&load function to get the pure int8 model.

q_model.save('saved_results')
fp32_model = MODEL()
fp32_model.eval()
from neural_compressor.utils.pytorch import load
int8_model = load('saved_results', fp32_model)

Sep 12 '23 03:09 xin3he

we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!

Apr 29 '24 08:04 thuang6

neural-compressor neural-compressor copied to clipboard

How to evaluate quantised model in Pytorch

neural-compressor
neural-compressor copied to clipboard