unsloth
unsloth copied to clipboard
GGUF breaks - llama-3
Findings from https://github.com/ggerganov/llama.cpp/issues/7062 and Discord chats: Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing
- Unsloth + float16 + QLoRA = WORKS
- Unsloth + bfloat16 + QLoRA = WORKS
- Unsloth + bfloat16 + LoRA = WORKS
- Unsloth + float16 + QLoRA + GGUF-f16 = FAILS
- Unsloth + bfloat16 + LoRA + GGUF-f16 = FAILS
Todo:
- [ ] HF directly + float16 + QLoRA + GGUF-f16
- [x] HF directly + float16 + LoRA + GGUF-f16
Update: Hi so I managed to test HF -> llama.cpp without Unsloth to remove Unsloth from the picture.
- '\n\n' is tokenized as [1734, 1734], unless if I prompted it incorrectly.
- [1734] using
tokenizer.batch_decode([1734])
returns\\n
. - Ie
llama.cpp
is tokenizing\n\n
as\\n\\n
. - Using HF directly, we get:
\\n
= 1734\n
= 198\n\n
= 271\n\n\n
= 1432 4*\n
= 1038 5*\n
= 14963 6*\n
= 5244 7*\n
= 35683 8*\n
= 6087 9*\n
= 55160
I used !python llama.cpp/convert-hf-to-gguf.py ./model --outfile ./model.f16.gguf --outtype f16
then !./llama.cpp/main -m ./model.f16.gguf -n 1024 --temp 0.0 --verbose-prompt --check-tensors \ -p "<|start_header_id|>user<|end_header_id|>\n\n!!llama.cpp!!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
See reproducible notebook: https://colab.research.google.com/drive/1aNS8CgXoJZHclBEW3ZjFfiLjpmqZ14KN?usp=sharing
Below is the comparison of tokenization differences between llama.cpp and HF:
I also used convert.py
which I'm assuming is not anyways supposed to work (maybe). I chose --vocab-type bpe
. Reproducible example: https://colab.research.google.com/drive/1X8XBdLRf1-eRDSfcr_GrIhaf84Wp9FH1?usp=sharing
Sadly convert.py
is even worse, splitting the newlines into 2 distinct characters:
Thanks for having looked into this. I've been suspicious of these \n
's in llama.cpp since I noticed that when I added \n\n for llama 3's prompt, the Continuation would usually add a third one at the start of the reply for no obvious reason. What you're finding it probably the reason for that.
It should be fixed!