AutoAWQ issues

Support Weight-Only quantization on CPU device with QBits backend

8

Based on the suggestion of https://github.com/casper-hansen/AutoAWQ/issues/390, we have implemented the inference of AWQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with...

PenghuiCheng

Support for deepseek v2

1

Can you please provide support for deepseek v2 deepseek-ai/DeepSeek-V2-Chat https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat

bks5881

Reduce the amount of gpu memory used in the quantification process

9

I used the example script in the readme to quantize llama3-8b ``` python quant_config = { "zero_point": True, "q_group_size": 16, "w_bit": 4, "version": "GEMM" } model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True},...

wx971025

Add phi3 support

@casper-hansen Thank you for your invitation. This PR introduces the support for phi3 for autoawq. Due to the fact that the phi3 hasn't been released to transformer package, I conducted...

pprp

Vllm AutoAWQ with 4-GPU doesnt utilize GPU

1

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I get 0% GPU utilization. Can...

danielstankw

Support for Traditional Linear/CNN Layer?

Any thoughts or suggestions would be appreciated. Thanks in advance.

satabios

Support for JAIS

1

Does AutoAWQ plan to support JAIS model quantization? https://huggingface.co/core42/jais-30b-v3 https://huggingface.co/core42/jais-30b-chat-v3

7ossam81

anyone found the solution to this?

jckongucsd

After quantization，the ppl is ok but humaneval score drops sharply

2

After using AutoAWQ quantizing my finetuned version model of qwen1.5-72b, i make two tests. 1. run ppl after quant for test 1 2. human eval test for test 2 for...

ehuaa

Phi-3 mini support?

6

Not the most powerful, but a useful model: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

vackosar

AutoAWQ
AutoAWQ copied to clipboard

Metadata

Support Weight-Only quantization on CPU device with QBits backend

Support for deepseek v2

Reduce the amount of gpu memory used in the quantification process

Add phi3 support

Vllm AutoAWQ with 4-GPU doesnt utilize GPU

Support for Traditional Linear/CNN Layer?

Support for JAIS

anyone found the solution to this?

After quantization，the ppl is ok but humaneval score drops sharply

Phi-3 mini support?

← Metadata

Owner

Metadata

AutoAWQ AutoAWQ copied to clipboard

Metadata

← Metadata

Owner

Metadata

AutoAWQ
AutoAWQ copied to clipboard