neural-compressor
neural-compressor copied to clipboard
How to evaluate quantised model in Pytorch
Hi,
The quantisation function in neural_compressor/quantization/fit
returns a PyTorchFXModel
object, which contains two members fp32_model
and model
. Could you please let me know what is the correct way of evaluating the quantised model in Pytorch? Following the examples in the documentation, is it correct to evaluate q_model.forward()
, or is there any helper function taking the quantised model and the test data as inputs?
I am currently seeing large differences between q_model.fp32_model(inputs)
and q_model(inputs)
or q_model.model(inputs)
, and would like to understand if this is the result of an incorrect PTQ configuration or an incorrect evaluation of the quantised model.
Thank you!
For INT8 model inference, q_model(inputs) == q_model.model(inputs)
I think. The int8 model is q_model.model
.
You can also use our save&load function to get the pure int8 model.
q_model.save('saved_results')
fp32_model = MODEL()
fp32_model.eval()
from neural_compressor.utils.pytorch import load
int8_model = load('saved_results', fp32_model)
we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!