AutoAWQ Model quantize error

hello. I am getting an error when running the sample below.

The request file does not exist in the original source, I copied and used the preprocessor_config.json file in the same model family.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'OpenGVLab/ASMv2'
quant_path = 'ASMv2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# https://huggingface.co/liuhaotian/llava-v1.6-34b-tokenizer/blob/main/preprocessor_config.json

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, device_map="cuda", **{"low_cpu_mem_usage": True}, safetensors=False
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

The source location where the error occurs is shown below.

#     AutoAWQ/awq/quantize/quantizer.py, line 407
        if best_ratio == -1:
            logging.debug(history)
            raise Exception

Aug 28 '24 10:08 sailfish009

I am facing the same issue, quantizing gemma-2-27B using

model.quantize(tokenizer, quant_config=quant_config, calib_data=data_final,max_calib_seq_len=4096,max_calib_samples=256,n_parallel_calib_samples=10)

It fails after 35% of steps are completed

@casper-hansen any idea how to fix this?

Sep 05 '24 07:09 raghavgarg97

@casper-hansen do you think it could be a gemma-2 model support issue? currently I am building awq from main branch of this repo

Sep 10 '24 11:09 raghavgarg97

same for me, failed after 79/80....

Oct 31 '24 14:10 yechenzhi

Please let me know if #668 fixes this! I am actively working to try and fix any bugs caused by changes in Huggingface transformers

Dec 03 '24 20:12 casper-hansen