web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'.

Open 137591 opened this issue 1 year ago • 2 comments

Why is the number of tokens in the prompt's output different from the actual number of tokens produced by the tokenizer? I used the LLaMA 2 tokenizer, and the prompt 'hello' is only split into 2 tokens. However, the prefill count provided by the project is 36 tokens, and experiments have confirmed that all prefill outputs have 34 more tokens than the original prompt's token count. Please explain the reason.

137591 avatar May 14 '24 05:05 137591

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.

If you'd like not to use a system prompt, try overriding it with an empty string:

  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

CharlieFRuan avatar May 14 '24 05:05 CharlieFRuan

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.它们是由于 mlc-chat-config.json 中所示的系统提示造成的:https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc -chat-c​​onfig.json#L33-L34,遵循官方模型发布的规范。

If you'd like not to use a system prompt, try overriding it with an empty string:如果您不想使用系统提示符,请尝试使用空字符串覆盖它:

  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

got it!thank you very much!

137591 avatar May 14 '24 06:05 137591