neural-compressor
neural-compressor copied to clipboard
How to evaluate quantised model in Pytorch
Hi,
The quantisation function in neural_compressor/quantization/fit returns a PyTorchFXModel object, which contains two members fp32_model and model. Could you please let me know what is the correct way of evaluating the quantised model in Pytorch? Following the examples in the documentation, is it correct to evaluate q_model.forward(), or is there any helper function taking the quantised model and the test data as inputs?
I am currently seeing large differences between q_model.fp32_model(inputs) and q_model(inputs) or q_model.model(inputs), and would like to understand if this is the result of an incorrect PTQ configuration or an incorrect evaluation of the quantised model.
Thank you!
For INT8 model inference, q_model(inputs) == q_model.model(inputs) I think. The int8 model is q_model.model.
You can also use our save&load function to get the pure int8 model.
q_model.save('saved_results')
fp32_model = MODEL()
fp32_model.eval()
from neural_compressor.utils.pytorch import load
int8_model = load('saved_results', fp32_model)
we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!