optimum
optimum copied to clipboard
GPTQ Quantization Need `use_marlin`
Feature request
refer to https://github.com/AutoGPTQ/AutoGPTQ/blob/main/README.md
2024-02-15 - (News) - AutoGPTQ 0.7.0 is released, with [Marlin](https://github.com/IST-DASLab/marlin) int4*fp16 matrix multiplication kernel support, with the argument use_marlin=True when loading models.
https://github.com/huggingface/optimum/blob/main/optimum/gptq/quantizer.py need a kernel choice config
Motivation
See benchmark with different autogptq kernel: https://github.com/huggingface/optimum/blob/main/tests/benchmark/README.md
Your contribution
PR if need
@wanghaichen1 Try GPTQModel where we monkeypatched HF integration which replaces AutoGPTQ.