fp8 topic

List fp8 repositories

TransformerEngine

1.8k
Stars
306
Forks
Watchers

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...

MS-AMP

510
Stars
42
Forks
Watchers

Microsoft Automatic Mixed Precision Library

neural-speed

346
Stars
38
Forks
Watchers

An innovative library for efficient LLM inference via low-bit quantization

flux-fp8-api

264
Stars
37
Forks
Watchers

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.