int8 topics

neural-compressor

2.2k

Stars

254

Forks

24

Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

intel

auto-tuning

deep-learning

knowledge-distillation

low-precision

retinaface

297

Stars

90

Forks

Watchers

Reimplement RetinaFace use C++ and TensorRT

clancylian

caffe

int8

mxnet2caffe

retinaface

yolov5_tensorrt_int8_tools

173

Stars

40

Forks

Watchers

tensorrt int8 量化yolov5 onnx模型

Wulingtian

int8

onnx

tensorrt

yolov5

yolov5_tensorrt_int8

165

Stars

26

Forks

Watchers

TensorRT int8 量化部署 yolov5s 模型，实测3.3ms一帧！

Wulingtian

int8

tensorrt

yolov5

RepVGG_TensorRT_int8

62

Stars

15

Forks

Watchers

RepVGG TensorRT int8 量化，实测推理不到1ms一帧！

Wulingtian

int8

repvgg

tensorrt

ncnn-yolov4-int8

21

Stars

5

Forks

Watchers

NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

ppogg

int8

ncnn

real-time

yolov4

neural-speed

346

Stars

38

Forks

Watchers

An innovative library for efficient LLM inference via low-bit quantization

intel

cpu

fp4

fp8

gaudi2

Tensorrt-int8-quantization-pipline

56

Stars

3

Forks

Watchers

a simple pipline of int8 quantization based on tensorrt.

xuanandsix

classifaction

int8

quantization

tensorrt

nanodet_tensorrt_int8

37

Stars

7

Forks

Watchers

nanodet int8 量化，实测推理2ms一帧！

Wulingtian

int8

nanodet

tensorrt

YOLOv8-ONNX-TensorRT

35

Stars

3

Forks

Watchers

👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera

the0807

computer-vision

fp16

int8

object-detection