neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

How to evaluate quantised model in Pytorch

Open georgesterpu opened this issue 1 year ago • 1 comments

Hi,

The quantisation function in neural_compressor/quantization/fit returns a PyTorchFXModel object, which contains two members fp32_model and model. Could you please let me know what is the correct way of evaluating the quantised model in Pytorch? Following the examples in the documentation, is it correct to evaluate q_model.forward(), or is there any helper function taking the quantised model and the test data as inputs?

I am currently seeing large differences between q_model.fp32_model(inputs) and q_model(inputs) or q_model.model(inputs), and would like to understand if this is the result of an incorrect PTQ configuration or an incorrect evaluation of the quantised model.

Thank you!

georgesterpu avatar Sep 01 '23 10:09 georgesterpu

For INT8 model inference, q_model(inputs) == q_model.model(inputs) I think. The int8 model is q_model.model. You can also use our save&load function to get the pure int8 model.

q_model.save('saved_results')
fp32_model = MODEL()
fp32_model.eval()
from neural_compressor.utils.pytorch import load
int8_model = load('saved_results', fp32_model)

xin3he avatar Sep 12 '23 03:09 xin3he

we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!

thuang6 avatar Apr 29 '24 08:04 thuang6