mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Question] HQQ Half-Quadratic Quantization improve the size, speed or quality of MLC LLM's?

Open yieme opened this issue 11 months ago • 1 comments

❓ General Questions

As I understand it, Half-Quadratic Quantization (HQQ), is a new technique for quantizing models to reduce the memory requirements of these models, making it easier to deploy them. I'm wondering if this would actually be something that would help the MLC-LLM improve it's reach.

ref: Half-Quadratic Quantization

yieme avatar Mar 09 '24 22:03 yieme

+1

Uralstech avatar Apr 01 '24 08:04 Uralstech