mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Accelerate quantization with Marlin kernel or HQQ

Open EricLBuehler opened this issue 1 year ago • 0 comments

The Marlin INT4xFP16 CUDA matmul kernel can achieve ~4x speed improvement over CUTLASS matmul.

See also: hqq as a quantization method which supports Marlin and other optimized kernels, without calibration data, and quantization in minutes.

EricLBuehler avatar Jun 18 '24 20:06 EricLBuehler