AutoAWQ
AutoAWQ copied to clipboard
awqint4 to gguf ,ModuleNotFoundError: No module named 'awq.apply_awq'
I want to use awq quantize a model, and use llama.cpp convert to gguf. but I followed the tutorial but got an error:Traceback (most recent call last):
File "/root/ld/ld_project/llama.cpp/convert_minicpm.py", line 2516, in
my awq version is autoawq 0.2.5+cu121 autoawq_kernels 0.0.6
mport os import subprocess from awq import AutoAWQForCausalLM from transformers import AutoTokenizer
model_path = 'mistralai/Mistral-7B-v0.1' quant_path = 'mistral-awq' llama_cpp_path = '/workspace/llama.cpp' quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 6, "version": "GEMM" }
model = AutoAWQForCausalLM.from_pretrained( model_path, **{"low_cpu_mem_usage": True, "use_cache": False} ) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.quantize( tokenizer, quant_config=quant_config, export_compatible=True )
model.save_quantized(quant_path) tokenizer.save_pretrained(quant_path) print(f'Model is quantized and saved at "{quant_path}"')
GGUF conversion
print('Converting model to GGUF...') llama_cpp_method = "q4_K_M" convert_cmd_path = os.path.join(llama_cpp_path, "convert.py") quantize_cmd_path = os.path.join(llama_cpp_path, "quantize")
if not os.path.exists(llama_cpp_path): cmd = f"git clone https://github.com/ggerganov/llama.cpp.git {llama_cpp_path} && cd {llama_cpp_path} && make LLAMA_CUBLAS=1 LLAMA_CUDA_F16=1" subprocess.run([cmd], shell=True, check=True)
subprocess.run([ f"python {convert_cmd_path} {quant_path} --outfile {quant_path}/model.gguf" ], shell=True, check=True)
subprocess.run([ f"{quantize_cmd_path} {quant_path}/model.gguf {quant_path}/model_{llama_cpp_method}.gguf {llama_cpp_method}" ], shell=True, check=True) this is my code
Hi @LDLINGLINGLING. This seems to be a llama.cpp package in your first message. Have you tried the GGUF export from the AutoAWQ documentation and did it succeed?
https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export
I didn't succeed,I followed the instructions in this link https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export, but the error at the top appeared
I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible
I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible
Hi @LDLINGLINGLING ~
It is true that --awq-path was remove by llama.cpp! You can refer from this issue.
https://github.com/ggerganov/llama.cpp/pull/5768
And by the way, I'm occur an error that might similar with this issue, hope someone can help me.
I had already converted an Phi-3-mini-128K model to AWQ. But when I trying to convert Phi-3-awq model to gguf(by llama.cpp convert_hf_to_gguf.py), I got an error below.
INFO:hf-to-gguf:Loading model: Phi-3-mini-128k-instruct-AWQ
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight, torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:token_embd.weight, torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {3072}
Traceback (most recent call last):
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3547, in <module>
main()
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3541, in main
model_instance.write()
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 330, in write
self.write_tensors()
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 267, in write_tensors
for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 234, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 185, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.qweight'
The error said that it cannot map through the define layer.
I was thinking, is it possible the error occur by the layer define to a new namemodel.layers.0.mlp.down_proj.qweight, but not as the original name model.layers.0.mlp.down_proj.weight?
If that so, how do I modify it?
sorry for bad English, but hope someone can help. ;-;
BR, Matt.