mlc-llm [Question] HQQ Half-Quadratic Quantization improve the size, speed or quality of MLC LLM's?

[Question] HQQ Half-Quadratic Quantization improve the size, speed or quality of MLC LLM's?

Open yieme opened this issue 11 months ago • 1 comments

❓ General Questions

As I understand it, Half-Quadratic Quantization (HQQ), is a new technique for quantizing models to reduce the memory requirements of these models, making it easier to deploy them. I'm wondering if this would actually be something that would help the MLC-LLM improve it's reach.

ref: Half-Quadratic Quantization

Mar 09 '24 22:03 yieme

Apr 01 '24 08:04 Uralstech

mlc-llm mlc-llm copied to clipboard

[Question] HQQ Half-Quadratic Quantization improve the size, speed or quality of MLC LLM's?

❓ General Questions

mlc-llm
mlc-llm copied to clipboard