llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Converting Ilama 4bit GPTQ Model from HF does not work

Open xonfour opened this issue 1 year ago • 10 comments

Hi! I tried to use the 13B Model from https://huggingface.co/maderix/llama-65b-4bit/

I converted the model using

python convert-gptq-to-ggml.py models/llama13b-4bit.pt models/tokenizer.model models/llama13b-4bit.bin

If I understand it correctly I still need to migrate the model and I tried it using

python migrate-ggml-2023-03-30-pr613.py models/llama13b-4bit.bin models/llama13b-4bit-new.bin

But after a few seconds this breaks with the following error:

Processing part 1 of 1

Processing tensor b'tok_embeddings.weight' with shape: [32000, 5120] and type: F16
Traceback (most recent call last):
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 311, in <module>
    main()
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 306, in main
    copy_tensors(fin, fout, part_id, n_parts)
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 169, in copy_tensors
    assert n_dims in (1, 2)
AssertionError

Is it an error or am I the one to blame?

xonfour avatar Apr 03 '23 18:04 xonfour

as today's master, you don't need to run migrate script. convert-gptq-ggml.py generated the latest version of model. Check the first 4 bytes of the generated file. the latest version should be 0x67676d66, the old version which needs migration should be: 0x67676d6c.

howard0su avatar Apr 04 '23 11:04 howard0su

ah I see! Well it is 6d66 already, but main expects a different version:

llama13b-4bit.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])

I did a git pull a few hours ago and converted the model afterwards.

xonfour avatar Apr 04 '23 12:04 xonfour

I guess convert-gptq-to-ggml.py needs an update? I just changed the version bytes and now it works!

xonfour avatar Apr 04 '23 12:04 xonfour

How many different GGML BIN file headers are there floating around now? 3? 4?

Asking for a friend..

JohnnyOpcode avatar Apr 04 '23 12:04 JohnnyOpcode

I think there are 3. the original one A , the new one B, the one recently introduced C.

In order to get from A -> B, run convert-unversioned-ggml-to-ggml.py In order to get from B -> C, run migrate-ggml-2023-03-30-pr613.py

howard0su avatar Apr 04 '23 15:04 howard0su

@xonfour how did you change the version bytes?

LoriTosoChef avatar Apr 04 '23 16:04 LoriTosoChef

@xonfour how did you change the version bytes?

Just change 0x67676d66 to 0x67676a74 on line 39 of convert-gptq-to-ggml.py and rerun the script.

I will prepare a pull request with the fix soon (after I test it).

prusnak avatar Apr 04 '23 17:04 prusnak

Fix in https://github.com/ggerganov/llama.cpp/pull/770

prusnak avatar Apr 05 '23 07:04 prusnak

After converting GPTQ to GGML do you still get the benefits of GPTQ with its better accuracy compared to RTN quantization?

xportz avatar Apr 11 '23 21:04 xportz

try the new convert.py script that is now in master, please

prusnak avatar Apr 14 '23 13:04 prusnak

@xonfour by looking at the commit log of "convert.py" (notes on latest GPTQ-for-LLaMA format), the issue has been solved with the latest convert.py with: python convert.py llama-7b-4bit.pt --vocab-dir models --outtype=f16 --outfile models/7B/ggml-model.bin

wyklq avatar May 22 '23 07:05 wyklq