llama.cpp
llama.cpp copied to clipboard
add phi3 support
Make phi3 as an explicit model to support in llama.
Might have to add <|end|>
as an EOT token:
diff --git a/llama.cpp b/llama.cpp
index 63483b9a..698ad236 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4381,6 +4381,7 @@ static void llm_load_vocab(
//vocab.id_to_token[t.second].type == LLAMA_TOKEN_TYPE_CONTROL &&
(t.first == "<|eot_id|>" ||
t.first == "<|im_end|>" ||
+ t.first == "<|end|>" ||
t.first == "<end_of_turn>"
)
) {
This seems to be producing better results than https://github.com/ggerganov/llama.cpp/pull/6851
For example, I don't see the <|calc|>
and <|/data|>
tokens generated. I wonder where the difference comes from?
So the difference was in the tokenization - in the other PR the <|system|>
token was being incorrectly mapped to 32034, while it should have been 32006. I applied the changes from this PR into _set_vocab_sentencepiece()
since this implementation seems to be correct: https://github.com/ggerganov/llama.cpp/pull/6851/commits/5dcccb3a7d0a6d1ce5886dd1f055555fd7759568
I wonder if it affects the conversion of some other models too?
Anyway, now the results match except for the other PR having a BOS token added at the start, while this PR does not:
https://github.com/ggerganov/llama.cpp/pull/6852/files#diff-ecca4c14f9a354b5557247cafd79409b332bfa1e9c12594f83282af1fde4743eR2064
Just double-checking if this is the intent?
There is also a minor issue because of this - the tokenizer.ggml.add_bos_token
KV is written 2 times in the header: one time it is true
and another it is false
:
python3 gguf-py/scripts/gguf-dump.py models/phi-3-4k-instruct/ggml-model-f16.gguf
* Loading: models/phi-3-4k-instruct/ggml-model-f16-new.gguf
Traceback (most recent call last):
KeyError: 'Duplicate tokenizer.ggml.add_bos_token already in list at offset 725511'
This is because we already write this field automatically here:
https://github.com/ggerganov/llama.cpp/blob/171a73890ec0948293c675a8ab1779e01aac906f/gguf-py/gguf/vocab.py#L61-L71
So is the implementation in https://github.com/ggerganov/llama.cpp/pull/6851 preferred or are both needed for official support?
hi, using server "<|eot_id|>" still printed at the end of conversation, and I can't find stop token now in /examples/server/utils.hpp, how to avoid this "<|eot_id|>" in server ?
thanks
Most likely you are using base model instead of instruct model. See https://github.com/ggerganov/llama.cpp/pull/6916 for clear explanation and way to add stop tokens from client-side
@ggerganov Hi, no i was using Phi-3-mini-128k-instruct.Q4_K_M.gguf, forget it, I think this was for server, for non server it already works fine