lmdeploy
lmdeploy copied to clipboard
[Feature] Support QuaRot quantization scheme
Motivation
QuaRot is out https://arxiv.org/abs/2404.00456 for three weeks. Preliminary results are convincing. Also see discussions in llama.cpp
with the QuaRot authors. It would be amazing to have it supported in LMDeploy as default.
Best.
Related resources
https://github.com/ggerganov/llama.cpp/issues/6444 https://arxiv.org/abs/2404.00456
Additional context
No response
@pppppM @AllentDan @lzhangzz may investigate QuaRot quantization algorithm, very promising