mlc-llm
mlc-llm copied to clipboard
[Question] HQQ Half-Quadratic Quantization improve the size, speed or quality of MLC LLM's?
❓ General Questions
As I understand it, Half-Quadratic Quantization (HQQ), is a new technique for quantizing models to reduce the memory requirements of these models, making it easier to deploy them. I'm wondering if this would actually be something that would help the MLC-LLM improve it's reach.
+1