text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

llama3 instruct models need multiple `eos_token_id` to make the output stop correctly

Open Yiximail opened this issue 10 months ago • 20 comments

Quick fix for llama3 doesn't stop correctly

change tokenizer_config.json from "eos_token": "<|end_of_text|>", to "eos_token": "<|eot_id|>", it should works If it's just a tokenizer bug, this enhancement issue doesn't need to be implemented.

image

Description

it seems that llama3 instruct models need multiple eos_token_id to make the output stop properly I manually changed it and it works well image

https://github.com/oobabooga/text-generation-webui/blob/26d822f64f2a029306b250b69dc58468662a4fc6/modules/text_generation.py#L325

however I can't find this token in tokenizer, should we make a custom_eos_token_ids or something?

It's also added in manually in the official demo image

and the stop parameter (stoping_strings) only works when trun skip_special_tokens off so if keep skip_special_tokens on, stoping_strings can't stop outputing because it can't match the <|eot_id|> from output

Yiximail avatar Apr 19 '24 14:04 Yiximail

FWIW Meta changed the official config to reflect this a few hours ago but I don't think the webui respects it.

Beinsezii avatar Apr 20 '24 00:04 Beinsezii

FWIW Meta changed the official config to reflect this a few hours ago but I don't think the webui respects it.

Ah, I have not yet had access to their models

Can you show me what they changed? Thanks.

Yiximail avatar Apr 20 '24 01:04 Yiximail

diff --git a/generation_config.json b/generation_config.json
index 4358365..aecb1b8 100644
--- a/generation_config.json
+++ b/generation_config.json
@@ -1,6 +1,6 @@
 {
   "_from_model_config": true,
   "bos_token_id": 128000,
-  "eos_token_id": 128001,
+  "eos_token_id": [128001, 128009],
   "transformers_version": "4.40.0.dev0"
 }
diff --git a/tokenizer_config.json b/tokenizer_config.json
index 5777175..870479e 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -2050,7 +2050,7 @@
     }
   },
   "bos_token": "<|begin_of_text|>",
-  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
+  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|end_of_text|>",
   "model_input_names": [

Beinsezii avatar Apr 20 '24 01:04 Beinsezii

The thing is I don't think the webui respects the eos_token_id as a list like that because it still doesn't halt properly.

The non-instruct 8B doesn't have that issue but counterintuitively it actually stops too early in most outputs. Sometimes after only one token.

Beinsezii avatar Apr 20 '24 01:04 Beinsezii

diff --git a/generation_config.json b/generation_config.json
index 4358365..aecb1b8 100644
--- a/generation_config.json
+++ b/generation_config.json
@@ -1,6 +1,6 @@
 {
   "_from_model_config": true,
   "bos_token_id": 128000,
-  "eos_token_id": 128001,
+  "eos_token_id": [128001, 128009],
   "transformers_version": "4.40.0.dev0"
 }
diff --git a/tokenizer_config.json b/tokenizer_config.json
index 5777175..870479e 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -2050,7 +2050,7 @@
     }
   },
   "bos_token": "<|begin_of_text|>",
-  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
+  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|end_of_text|>",
   "model_input_names": [

Thank you. I will test it later.

However even if they changed it to a list, it still only for generation_config.json. And the webui takes the eos_token from the tokenizer.

Yiximail avatar Apr 20 '24 01:04 Yiximail

Fixed GGUF models here: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main

HamedEmine avatar Apr 20 '24 03:04 HamedEmine

Quick fix for llama3 doesn't stop correctly

You need to also mention that this will break it for everything else than llama-3, otherwise some people would just blindly do the changes. You should probably call it hack instead of fix. (I don't mean it in a bad way, but that';s what it is)

FartyPants avatar Apr 21 '24 14:04 FartyPants

Quick fix for llama3 doesn't stop correctly

You need to also mention that this will break it for everything else than llama-3, otherwise some people would just blindly do the changes. You should probably call it hack instead of fix. (I don't mean it in a bad way, but that';s what it is)

uh, sorry. This change will only apply to models that use this tokenizer_config.json. In most cases, only for the model in the same place. I'm not sure if this change will break anything else, can you give me some examples?

Yiximail avatar Apr 22 '24 10:04 Yiximail

It seems that Phi 3 also requires multiple EOS tokens: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/generation_config.json

  "eos_token_id": [
    32000,
    32007
  ],

theo77186 avatar Apr 23 '24 19:04 theo77186

Can this be fixed, please? I am using gguf format, so can't just change tokenizer_config.json I also have tried to modify text_generation.py but it doesn't fix the problem.

Btw, in my case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the right stopping token seems to be token "assistant", which is 78191.

goodglitch avatar Apr 24 '24 09:04 goodglitch

Having the same issue. Chat mode works fine, but API gives the endless response even if I add the custom stopping tokens.

cjhandley avatar Apr 24 '24 16:04 cjhandley

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"'

as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

egSat avatar Apr 25 '24 07:04 egSat

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"'

as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on.

In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API.

I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

goodglitch avatar Apr 26 '24 09:04 goodglitch

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on.

In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API.

I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?

Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main

PS: The fixed models have "with_temp_stop_token_fix" in the name.

HamedEmine avatar Apr 26 '24 10:04 HamedEmine

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?

Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main

PS: The fixed models have "with_temp_stop_token_fix" in the name.

Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.

goodglitch avatar Apr 26 '24 11:04 goodglitch

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them? Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main PS: The fixed models have "with_temp_stop_token_fix" in the name.

Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.

Oh, it's not named custom_stopping_strings. I'm truly sorry if I confused you.

try:

{
  ...
  "skip_special_tokens": false,
  "stop": ["<|eot_id|>"]
}

Don't forget the "skip_special_tokens": false,


What I said custom_stopping_strings is a variable name in codes

Yiximail avatar Apr 26 '24 11:04 Yiximail

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them? Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main PS: The fixed models have "with_temp_stop_token_fix" in the name.

Man, I can confirm that "fixed" model works fine on my end. However, as you see from the model name I am using 70B model. If there is fixed version that will be great! Or if someone can fix api to make 'custom_stopping_strings' parameter in payload actually working.

Oh, it's not named custom_stopping_strings. I'm truly sorry if I confused you.

try:

{
  ...
  "skip_special_tokens": false,
  "stop": ["<|eot_id|>"]
}

Don't forget the "skip_special_tokens": false,

What I said custom_stopping_strings is a variable name in codes

Man that helped! For some reason adding these fields to completions.py didn't do the job, but I added ["assistant"] as a 'stop' field to the payload and it worked! Now of course it will stop every time there will be word "assistant" in the answer, but I think I can live with that)) Thanx!

goodglitch avatar Apr 26 '24 11:04 goodglitch

Is multiple EOS tokens going to be handled OOTB soon?

DevasiaThomas avatar May 17 '24 16:05 DevasiaThomas

We need a script to fix all the "broken" llama-3b around. I dumped the script from a working one and it is:

{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}{% endif %}

which in JSON is:

"{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}{% endif %}"

Zibri avatar May 18 '24 12:05 Zibri

I think that the problem here is that the eot_id cannot be obtained from the shared.tokenizer object loaded through AutoTokenizer.from_pretrained. This can be verified by checking the attributes under dir(shared.tokenizer). The EOS token is there, but not the EOT token, which is new.

If the EOT token were known, it could simply be added to this list

https://github.com/oobabooga/text-generation-webui/blob/abe5ddc8833206381c43b002e95788d4cca0893a/modules/text_generation.py#L319

oobabooga avatar May 20 '24 02:05 oobabooga

I think that the problem here is that the eot_id cannot be obtained from the shared.tokenizer object loaded through AutoTokenizer.from_pretrained. This can be verified by checking the attributes under dir(shared.tokenizer). The EOS token is there, but not the EOT token, which is new.

If the EOT token were known, it could simply be added to this list

https://github.com/oobabooga/text-generation-webui/blob/abe5ddc8833206381c43b002e95788d4cca0893a/modules/text_generation.py#L319

Yes, we could add some custom eos tokens. But, the eos_token_id parameter doesn't seem to work for llamacpp / gguf I couldn't find a suitable generalized solution, so I didn't make a PR.

Yiximail avatar May 22 '24 10:05 Yiximail

I think that this may have fixed the issue by looking at the eot token in string format rather than its ID.

https://github.com/oobabooga/text-generation-webui/commit/5499bc9bc8d2b24f163c0026dce05df21a25a691

oobabooga avatar May 22 '24 16:05 oobabooga

thanks again

Meisamrzpr avatar May 22 '24 19:05 Meisamrzpr

Thanks again

Meisamrzpr avatar May 22 '24 19:05 Meisamrzpr

For API I had to manually insert in completions.py the fields: 'skip_special_tokens': False, 'custom_stopping_strings': '"<eot_id>"' as the other side doesnt insert those fields. I think the API should complete the missing parameters from defaults of the webapi that are configured

Have you changed generate_params.update line in completions.py? I have updated it with your lines of code but it didn't work. Actually for me it doesn't work anywhere. It doesn't work in Chat or Notebook tabs, it also doesn't work in API it goes on and on. In case of Meta-Llama-3-70B-Instruct.IQ2_XS.gguf the only way to fix it in Chat mode is to add "assistant" in Parameters/Custom stopping strings, but obviously it doesn't have any effect on API. I really do not understand how the hottest local LLM on the market still doesn't work properly in Oobabooga! The fix should be as easy as adding additional stopping token.

As I mentioned above, these "fixed" models worked for me (I didn't have to edit anything) Can you try with one of them?

Link to the models: https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/tree/main

PS: The fixed models have "with_temp_stop_token_fix" in the name.

Just for understanding: using this model does not need any other changes at "text-generation-webui" or any model files? Because GGUF files are something like an All-inclusive-file?

Using that model will result in an llama3 chat, that will not repeat the text again and again?

B0rner avatar Jul 04 '24 15:07 B0rner