Model quantize error
hello. I am getting an error when running the sample below.
The request file does not exist in the original source, I copied and used the preprocessor_config.json file in the same model family.
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_path = 'OpenGVLab/ASMv2'
quant_path = 'ASMv2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
# https://huggingface.co/liuhaotian/llava-v1.6-34b-tokenizer/blob/main/preprocessor_config.json
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path, device_map="cuda", **{"low_cpu_mem_usage": True}, safetensors=False
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
The source location where the error occurs is shown below.
# AutoAWQ/awq/quantize/quantizer.py, line 407
if best_ratio == -1:
logging.debug(history)
raise Exception
I am facing the same issue, quantizing gemma-2-27B using
model.quantize(tokenizer, quant_config=quant_config, calib_data=data_final,max_calib_seq_len=4096,max_calib_samples=256,n_parallel_calib_samples=10)
It fails after 35% of steps are completed
@casper-hansen any idea how to fix this?
@casper-hansen do you think it could be a gemma-2 model support issue? currently I am building awq from main branch of this repo
same for me, failed after 79/80....
Please let me know if #668 fixes this! I am actively working to try and fix any bugs caused by changes in Huggingface transformers