llama.cpp GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2

GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)

Open chokoon123 opened this issue 2 days ago • 0 comments

im try to convert this ggml to gguf but i got this error .thank you

python convert_llama_ggml_to_gguf.py --input "D:\nectec\model\llama-2-13b-chat.ggmlv3.q2_K.bin" --output "D:\nectec\model\llama-2-13b-chat.gguf" INFO:ggml-to-gguf:* Using config: Namespace(input=WindowsPath('D:/nectec/model/llama-2-13b-chat.ggmlv3.q2_K.bin'), output=WindowsPath('D:/nectec/model/llama-2-13b-chat.gguf'), name=None, desc=None, gqa=8, eps='0', context_length=2048, model_metadata_dir=None, vocab_dir=None, vocabtype='spm,hfft', verbose=False) WARNING:ggml-to-gguf:=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING === INFO:ggml-to-gguf:* Scanning GGML input file INFO:ggml-to-gguf:* File format: GGJTv3 with ftype MOSTLY_Q2_K INFO:ggml-to-gguf:* GGML model hyperparameters: <Hyperparameters: n_vocab=32000, n_embd=5120, n_mult=256, n_head=40, n_layer=40, n_rot=128, n_ff=13824, ftype=MOSTLY_Q2_K> WARNING:ggml-to-gguf: === WARNING === Special tokens may not be converted correctly. Use --model-metadata-dir if possible === WARNING ===

INFO:ggml-to-gguf:- Guessed n_kv_head = 5 based on GQA 8 INFO:ggml-to-gguf:* Preparing to save GGUF file INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:ggml-to-gguf:* Adding model parameters and KV items INFO:ggml-to-gguf:* Adding 32000 vocab item(s) INFO:ggml-to-gguf:* Adding 363 tensor(s) Traceback (most recent call last): File "D:\nectec\model\New folder\llama.cpp\convert_llama_ggml_to_gguf.py", line 450, in main() File "D:\nectec\model\New folder\llama.cpp\convert_llama_ggml_to_gguf.py", line 445, in main converter.save() File "D:\nectec\model\New folder\llama.cpp\convert_llama_ggml_to_gguf.py", line 238, in save self.add_tensors(gguf_writer) File "D:\nectec\model\New folder\llama.cpp\convert_llama_ggml_to_gguf.py", line 353, in add_tensors gguf_writer.add_tensor( File "D:\nectec\model\New folder\llama.cpp\gguf-py\gguf\gguf_writer.py", line 381, in add_tensor self.add_tensor_info(name, shape, tensor.dtype, tensor.nbytes, raw_dtype=raw_dtype) File "D:\nectec\model\New folder\llama.cpp\gguf-py\gguf\gguf_writer.py", line 354, in add_tensor_info tensor_shape = quant_shape_from_byte_shape(tensor_shape, raw_dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\nectec\model\New folder\llama.cpp\gguf-py\gguf\quants.py", line 24, in quant_shape_from_byte_shape raise ValueError(f"Quantized tensor bytes per row ({shape[-1]}) is not a multiple of {quant_type.name} type size ({type_size})") ValueError: Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)

Feb 20 '25 14:02 chokoon123

llama.cpp llama.cpp copied to clipboard

GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)

llama.cpp
llama.cpp copied to clipboard