inference Query: How to run BERT INT8 TF model

Query: How to run BERT INT8 TF model

Open avinashcpandey opened this issue 3 years ago • 1 comments

Hi,

I am trying to run BERT INT8 with TF backend. However, I don't see TF INT8 model info in below link.
https://github.com/mlcommons/inference/tree/master/language/bert

Any help on how to run will be highly appreciated.

Oct 13 '22 11:10 avinashcpandey

Not all quantized models are added in the inference repo. You're free to take any fp32 model and do your own quantization method. More details regarding retraining can be found here

Oct 15 '22 18:10 arjunsuresh

inference inference copied to clipboard

Query: How to run BERT INT8 TF model

inference
inference copied to clipboard