AutoAWQ icon indicating copy to clipboard operation
AutoAWQ copied to clipboard

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.

Results 234 AutoAWQ issues
Sort by recently updated
recently updated
newest added

Based on the suggestion of https://github.com/casper-hansen/AutoAWQ/issues/390, we have implemented the inference of AWQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with...

Can you please provide support for deepseek v2 deepseek-ai/DeepSeek-V2-Chat https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat

I used the example script in the readme to quantize llama3-8b ``` python quant_config = { "zero_point": True, "q_group_size": 16, "w_bit": 4, "version": "GEMM" } model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True},...

@casper-hansen Thank you for your invitation. This PR introduces the support for phi3 for autoawq. Due to the fact that the phi3 hasn't been released to transformer package, I conducted...

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I get 0% GPU utilization. Can...

Any thoughts or suggestions would be appreciated. Thanks in advance.

Does AutoAWQ plan to support JAIS model quantization? https://huggingface.co/core42/jais-30b-v3 https://huggingface.co/core42/jais-30b-chat-v3

After using AutoAWQ quantizing my finetuned version model of qwen1.5-72b, i make two tests. 1. run ppl after quant for test 1 2. human eval test for test 2 for...

Not the most powerful, but a useful model: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct