yolo2_light INT8 implementation reference

INT8 implementation reference

Open trustin77 opened this issue 5 years ago • 1 comments

Hi, @AlexeyAB

I'd like to know more about how INT8 version is implemented. Is it based on one/more papers? Could you give related links for reference?

Thanks

Nov 06 '18 11:11 trustin77

@trustin77 Hi,

I have not seen step by step instructions on how to do this. I used these documentations:

How Float-32 is converted to the INT-8 in the TensorRT: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
How to use CUDNN_DATA_INT8x4 in cuDNN: https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionForward
How to convert CUDNN_TENSOR_NCHW & INT8 to CUDNN_TENSOR_NCHW_VECT_C & INT8x4: https://devtalk.nvidia.com/default/topic/1028139/cudnn/how-to-reduce-time-spent-in-transforming-tensors-using-cudnnv6-0-for-api-cudnntransformtensor-/post/5264978/#5264978

About optimzal input_calibration: https://github.com/AlexeyAB/yolo2_light/issues/24#issuecomment-435361415

Also about quantization:

Yolo v2 INT8 - too high a reduction of accuracy: http://cs231n.stanford.edu/reports/2017/pdfs/808.pdf
optimal quantization is INT 4-bit: https://arxiv.org/abs/1510.00149
XNOR BIT1 quantization - This motivates us to avoid binarization at the first and last layer of a CNN: https://arxiv.org/abs/1603.05279
MobileNet quantization: https://arxiv.org/abs/1712.05877
Quantization of old models: https://arxiv.org/abs/1512.06473
About XNOR: https://arxiv.org/abs/1807.03010
Also about XNOR: https://arxiv.org/abs/1803.05849

Nov 06 '18 13:11 AlexeyAB