inference
inference copied to clipboard
Query: How to run BERT INT8 TF model
Hi,
I am trying to run BERT INT8 with TF backend. However, I don't see TF INT8 model info in below link.
https://github.com/mlcommons/inference/tree/master/language/bert
Any help on how to run will be highly appreciated.
Not all quantized models are added in the inference repo. You're free to take any fp32 model and do your own quantization method. More details regarding retraining can be found here