ZeroQ
ZeroQ copied to clipboard
increased inference latency for quantized model
I hava just reproduced the classification on resnet50+imagenet. The accuracy is excellent!
But there is a significant increase in inference latency for quantized model. Test results on resnet + imagenet + tesla t4:
- test(model, test_loader) takes 143 seconds
- test(quantized_model, test_loader) takes 1442 seconds
Does anybody hit the same issue ?