int8 topic
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
retinaface
Reimplement RetinaFace use C++ and TensorRT
yolov5_tensorrt_int8_tools
tensorrt int8 量化yolov5 onnx模型
yolov5_tensorrt_int8
TensorRT int8 量化部署 yolov5s 模型,实测3.3ms一帧!
RepVGG_TensorRT_int8
RepVGG TensorRT int8 量化,实测推理不到1ms一帧!
ncnn-yolov4-int8
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Tensorrt-int8-quantization-pipline
a simple pipline of int8 quantization based on tensorrt.
YOLOv8-ONNX-TensorRT
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera