mistral.rs
mistral.rs copied to clipboard
Accelerate quantization with Marlin kernel or HQQ
The Marlin INT4xFP16 CUDA matmul kernel can achieve ~4x speed improvement over CUTLASS matmul.
See also: hqq as a quantization method which supports Marlin and other optimized kernels, without calibration data, and quantization in minutes.