llama.cpp
llama.cpp copied to clipboard
Converting Ilama 4bit GPTQ Model from HF does not work
Hi! I tried to use the 13B Model from https://huggingface.co/maderix/llama-65b-4bit/
I converted the model using
python convert-gptq-to-ggml.py models/llama13b-4bit.pt models/tokenizer.model models/llama13b-4bit.bin
If I understand it correctly I still need to migrate the model and I tried it using
python migrate-ggml-2023-03-30-pr613.py models/llama13b-4bit.bin models/llama13b-4bit-new.bin
But after a few seconds this breaks with the following error:
Processing part 1 of 1
Processing tensor b'tok_embeddings.weight' with shape: [32000, 5120] and type: F16
Traceback (most recent call last):
File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 311, in <module>
main()
File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 306, in main
copy_tensors(fin, fout, part_id, n_parts)
File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 169, in copy_tensors
assert n_dims in (1, 2)
AssertionError
Is it an error or am I the one to blame?
as today's master, you don't need to run migrate script. convert-gptq-ggml.py generated the latest version of model. Check the first 4 bytes of the generated file. the latest version should be 0x67676d66, the old version which needs migration should be: 0x67676d6c.
ah I see! Well it is 6d66 already, but main expects a different version:
llama13b-4bit.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])
I did a git pull a few hours ago and converted the model afterwards.
I guess convert-gptq-to-ggml.py needs an update? I just changed the version bytes and now it works!
How many different GGML BIN file headers are there floating around now? 3? 4?
Asking for a friend..
I think there are 3. the original one A , the new one B, the one recently introduced C.
In order to get from A -> B, run convert-unversioned-ggml-to-ggml.py In order to get from B -> C, run migrate-ggml-2023-03-30-pr613.py
@xonfour how did you change the version bytes?
@xonfour how did you change the version bytes?
Just change 0x67676d66
to 0x67676a74
on line 39 of convert-gptq-to-ggml.py
and rerun the script.
I will prepare a pull request with the fix soon (after I test it).
Fix in https://github.com/ggerganov/llama.cpp/pull/770
After converting GPTQ to GGML do you still get the benefits of GPTQ with its better accuracy compared to RTN quantization?
try the new convert.py
script that is now in master, please
@xonfour by looking at the commit log of "convert.py" (notes on latest GPTQ-for-LLaMA format), the issue has been solved with the latest convert.py with: python convert.py llama-7b-4bit.pt --vocab-dir models --outtype=f16 --outfile models/7B/ggml-model.bin