I want to use awq quantize a model, and use llama.cpp convert to gguf. but I followed the tutorial but got an error：Traceback (most recent call last): File "/root/ld/ld_project/llama.cpp/convert_minicpm.py", line 2516, in main() File "/root/ld/ld_project/llama.cpp/convert_minicpm.py", line 2460, in main from awq.apply_awq import add_scale_weights # type: ignore[import-not-found] ModuleNotFoundError: No module named 'awq.apply_awq'

my awq version is autoawq 0.2.5+cu121 autoawq_kernels 0.0.6

Jun 14 '24 08:06 LDLINGLINGLING

mport os import subprocess from awq import AutoAWQForCausalLM from transformers import AutoTokenizer

model_path = 'mistralai/Mistral-7B-v0.1' quant_path = 'mistral-awq' llama_cpp_path = '/workspace/llama.cpp' quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 6, "version": "GEMM" }

model = AutoAWQForCausalLM.from_pretrained( model_path, **{"low_cpu_mem_usage": True, "use_cache": False} ) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.quantize( tokenizer, quant_config=quant_config, export_compatible=True )

model.save_quantized(quant_path) tokenizer.save_pretrained(quant_path) print(f'Model is quantized and saved at "{quant_path}"')

GGUF conversion

print('Converting model to GGUF...') llama_cpp_method = "q4_K_M" convert_cmd_path = os.path.join(llama_cpp_path, "convert.py") quantize_cmd_path = os.path.join(llama_cpp_path, "quantize")

if not os.path.exists(llama_cpp_path): cmd = f"git clone https://github.com/ggerganov/llama.cpp.git {llama_cpp_path} && cd {llama_cpp_path} && make LLAMA_CUBLAS=1 LLAMA_CUDA_F16=1" subprocess.run([cmd], shell=True, check=True)

subprocess.run([ f"python {convert_cmd_path} {quant_path} --outfile {quant_path}/model.gguf" ], shell=True, check=True)

subprocess.run([ f"{quantize_cmd_path} {quant_path}/model.gguf {quant_path}/model_{llama_cpp_method}.gguf {llama_cpp_method}" ], shell=True, check=True) this is my code

Jun 18 '24 08:06 LDLINGLINGLING

Hi @LDLINGLINGLING. This seems to be a llama.cpp package in your first message. Have you tried the GGUF export from the AutoAWQ documentation and did it succeed?

https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export

Jul 02 '24 14:07 casper-hansen

I didn't succeed，I followed the instructions in this link https://casper-hansen.github.io/AutoAWQ/examples/#gguf-export, but the error at the top appeared

Jul 03 '24 01:07 LDLINGLINGLING

I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible

Jul 03 '24 01:07 LDLINGLINGLING

I now think this operation is meaningless, because I originally thought that awq has high quantization accuracy. Whether converting to gguf can maintain this accuracy, but it should be impossible

Hi @LDLINGLINGLING ~ It is true that --awq-path was remove by llama.cpp! You can refer from this issue. https://github.com/ggerganov/llama.cpp/pull/5768

And by the way, I'm occur an error that might similar with this issue, hope someone can help me.

I had already converted an Phi-3-mini-128K model to AWQ. But when I trying to convert Phi-3-awq model to gguf(by llama.cpp convert_hf_to_gguf.py), I got an error below.

INFO:hf-to-gguf:Loading model: Phi-3-mini-128k-instruct-AWQ
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,             torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:token_embd.weight,         torch.float16 --> F16, shape = {3072, 32064}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.float16 --> F32, shape = {3072}
Traceback (most recent call last):
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3547, in <module>
    main()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 3541, in main
    model_instance.write()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 330, in write
    self.write_tensors()
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 267, in write_tensors
    for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 234, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/home/matt/work/llama.cpp/convert_hf_to_gguf.py", line 185, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.qweight'

The error said that it cannot map through the define layer. I was thinking, is it possible the error occur by the layer define to a new namemodel.layers.0.mlp.down_proj.qweight, but not as the original name model.layers.0.mlp.down_proj.weight?

If that so, how do I modify it?

sorry for bad English, but hope someone can help. ;-;

BR, Matt.

Jul 10 '24 06:07 hanasay

AutoAWQ
AutoAWQ copied to clipboard

awqint4 to gguf ,ModuleNotFoundError: No module named 'awq.apply_awq'

GGUF conversion

AutoAWQ AutoAWQ copied to clipboard

awqint4 to gguf ,ModuleNotFoundError: No module named 'awq.apply_awq'

GGUF conversion

AutoAWQ
AutoAWQ copied to clipboard