text-generation-webui
text-generation-webui copied to clipboard
llama3 instruct models need multiple `eos_token_id` to make the output stop correctly
Quick fix for llama3 doesn't stop correctly
change tokenizer_config.json
from "eos_token": "<|end_of_text|>",
to "eos_token": "<|eot_id|>",
it should works
If it's just a tokenizer bug, this enhancement issue doesn't need to be implemented.
Description
it seems that llama3 instruct models need multiple eos_token_id to make the output stop properly
I manually changed it and it works well
https://github.com/oobabooga/text-generation-webui/blob/26d822f64f2a029306b250b69dc58468662a4fc6/modules/text_generation.py#L325
however I can't find this token in tokenizer, should we make a custom_eos_token_ids or something?
It's also added in manually in the official demo
and the stop
parameter (stoping_strings
) only works when trun skip_special_tokens
off
so if keep skip_special_tokens
on, stoping_strings
can't stop outputing because it can't match the <|eot_id|>
from output
FWIW Meta changed the official config to reflect this a few hours ago but I don't think the webui respects it.
FWIW Meta changed the official config to reflect this a few hours ago but I don't think the webui respects it.
Ah, I have not yet had access to their models
Can you show me what they changed? Thanks.
diff --git a/generation_config.json b/generation_config.json
index 4358365..aecb1b8 100644
--- a/generation_config.json
+++ b/generation_config.json
@@ -1,6 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
- "eos_token_id": 128001,
+ "eos_token_id": [128001, 128009],
"transformers_version": "4.40.0.dev0"
}
diff --git a/tokenizer_config.json b/tokenizer_config.json
index 5777175..870479e 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -2050,7 +2050,7 @@
}
},
"bos_token": "<|begin_of_text|>",
- "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
+ "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
"clean_up_tokenization_spaces": true,
"eos_token": "<|end_of_text|>",
"model_input_names": [
The thing is I don't think the webui respects the eos_token_id
as a list like that because it still doesn't halt properly.
The non-instruct 8B doesn't have that issue but counterintuitively it actually stops too early in most outputs. Sometimes after only one token.
diff --git a/generation_config.json b/generation_config.json index 4358365..aecb1b8 100644 --- a/generation_config.json +++ b/generation_config.json @@ -1,6 +1,6 @@ { "_from_model_config": true, "bos_token_id": 128000, - "eos_token_id": 128001, + "eos_token_id": [128001, 128009], "transformers_version": "4.40.0.dev0" } diff --git a/tokenizer_config.json b/tokenizer_config.json index 5777175..870479e 100644 --- a/tokenizer_config.json +++ b/tokenizer_config.json @@ -2050,7 +2050,7 @@ } }, "bos_token": "<|begin_of_text|>", - "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}", + "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}", "clean_up_tokenization_spaces": true, "eos_token": "<|end_of_text|>", "model_input_names": [
Thank you. I will test it later.
However even if they changed it to a list, it still only for generation_config.json
. And the webui takes the eos_token
from the tokenizer.
Fixed GGUF models here: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main
Quick fix for llama3 doesn't stop correctly
You need to also mention that this will break it for everything else than llama-3, otherwise some people would just blindly do the changes. You should probably call it hack instead of fix. (I don't mean it in a bad way, but that';s what it is)
Quick fix for llama3 doesn't stop correctly
You need to also mention that this will break it for everything else than llama-3, otherwise some people would just blindly do the changes. You should probably call it hack instead of fix. (I don't mean it in a bad way, but that';s what it is)
uh, sorry.
This change will only apply to models that use this tokenizer_config.json
. In most cases, only for the model in the same place.
I'm not sure if this change will break anything else, can you give me some examples?
It seems that Phi 3 also requires multiple EOS tokens: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/generation_config.json
"eos_token_id": [
32000,
32007
],
Can this be fixed, please? I am using gguf format, so can't just change tokenizer_config.json I also have tried to modify text_generation.py but it doesn't fix the problem.
Btw, in my case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the right stopping token seems to be token "assistant", which is 78191.
Having the same issue. Chat mode works fine, but API gives the endless response even if I add the custom stopping tokens.
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"'
as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"'
as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on.
In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API.
I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on.
In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API.
I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?
Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main
PS: The fixed models have "with_temp_stop_token_fix" in the name.
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?
Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main
PS: The fixed models have "with_temp_stop_token_fix" in the name.
Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them? Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main PS: The fixed models have "with_temp_stop_token_fix" in the name.
Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.
Oh, it's not named custom_stopping_strings
. I'm truly sorry if I confused you.
try:
{
...
"skip_special_tokens": false,
"stop": ["<|eot_id|>"]
}
Don't forget the "skip_special_tokens": false,
What I said custom_stopping_strings
is a variable name in codes
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them? Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main PS: The fixed models have "with_temp_stop_token_fix" in the name.
Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.
Oh, it's not named
custom_stopping_strings
. I'm truly sorry if I confused you.try:
{ ... "skip_special_tokens": false, "stop": ["<|eot_id|>"] }
Don't forget the
"skip_special_tokens": false,
What I said
custom_stopping_strings
is a variable name in codes
Man that helped! For some reason adding these fields to completions.py didn't do the job, but I added ["assistant"] as a 'stop' field to the payload and it worked! Now of course it will stop every time there will be word "assistant" in the answer, but I think I can live with that)) Thanx!
Is multiple EOS tokens going to be handled OOTB soon?
We need a script to fix all the "broken" llama-3b around. I dumped the script from a working one and it is:
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
' }}{% endif %}
which in JSON is:
"{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}{% endif %}"
I think that the problem here is that the eot_id
cannot be obtained from the shared.tokenizer
object loaded through AutoTokenizer.from_pretrained
. This can be verified by checking the attributes under dir(shared.tokenizer)
. The EOS token is there, but not the EOT token, which is new.
If the EOT token were known, it could simply be added to this list
https://github.com/oobabooga/text-generation-webui/blob/abe5ddc8833206381c43b002e95788d4cca0893a/modules/text_generation.py#L319
I think that the problem here is that the
eot_id
cannot be obtained from theshared.tokenizer
object loaded throughAutoTokenizer.from_pretrained
. This can be verified by checking the attributes underdir(shared.tokenizer)
. The EOS token is there, but not the EOT token, which is new.If the EOT token were known, it could simply be added to this list
https://github.com/oobabooga/text-generation-webui/blob/abe5ddc8833206381c43b002e95788d4cca0893a/modules/text_generation.py#L319
Yes, we could add some custom eos tokens.
But, the eos_token_id
parameter doesn't seem to work for llamacpp
/ gguf
I couldn't find a suitable generalized solution, so I didn't make a PR.
I think that this may have fixed the issue by looking at the eot token in string format rather than its ID.
https://github.com/oobabooga/text-generation-webui/commit/5499bc9bc8d2b24f163c0026dce05df21a25a691
thanks again
Thanks again
For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured
Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.
As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?
Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main
PS: The fixed models have "with_temp_stop_token_fix" in the name.
Just for understanding: using this model does not need any other changes at "text-generation-webui" or any model files? Because GGUF files are something like an All-inclusive-file?
Using that model will result in an llama3 chat, that will not repeat the text again and again?