llama3 The absence or presence of a system token results in different outputs.

Describe the bug

As per the official documentation: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

It is stated:

A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followed by the assistant header.

However, in follow-up examples given in the documentation, system token is only present if the system message is present:

1: Single message example

<|begin_of_text|>1<|start_header_id|>user<|end_header_id|>2
{{ user_message }}3<|eot_id|>4<|start_header_id|>assistant<|end_header_id|>5

2: System prompt message added to a single user message

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

This can be seen here in my findings: https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100500582

Fine tuning instruct model: Fine tuning the instruct models with system token present, and then run inference without system tokens present, breaks the fine tuning.

Inference on original instruct model: Since the outputs are different based on the presence of system tokens, the question arrives, is the output better or worse for the instruct models? Which method produces the expected output based on the instruct tuning that has been done internally by Meta?

May 10 '24 12:05 Sneakr

So, did META just change the model card page after my github issue, completely ignoring this issue? :)

https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

May 11 '24 20:05 Sneakr

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

Are you referring to a case where you pass the system header but no system_prompt, i.e.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|>

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. If you don't have a system message it is better to not include the system header. This is how we encode dialogs https://github.com/meta-llama/llama3/blob/cc44ca2e1c269f0e56e6926d7f4837c983c060dc/llama/tokenizer.py#L202

I don't think the changes to the model-card are related to this issue, but we'd appreciate your suggestions to improve its clarity :) cc @carljparker

May 14 '24 15:05 subramen

@subramen

Thanks for your response. Yes, that's what I'm referring to.

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string.

It is indeed expected behavior, as the input becomes is different, the output would be different. However the question is which output is the expected one by the author of the model and the training process.

As per my findings, If the model has been trained with system headers present (in my case fine tuned):

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

And later inferenced as per the tokenizer.py you referenced

Conclusion: It produces a different output which breaks the behaviour of the training progress and the training data - if the system headers are not present as they were during the training process.

If you don't have a system message it is better to not include the system header. This is how we encode dialogs

1: Why would it not be included if it was trained with a system header? Wouldn't it be logical to assume that your outputs during training is the one we should expect during inference, and therefore keep the system headers as is regardless of an empty system message or not?

2: What makes you conclude that it is better to leave out the system message? We have 2 different outputs, how do we come to that conclusion that one output (without system headers) would be better than the other (with system headers)?

In my tests, the opposite is true, especially during tuning and training, leaving out tokens that were present during training would break the expected output.

I'm grateful for clarification and your response! :)

In regards to the model card page, it is something only one can speculate and only the author of the page knows the reason for the changes, it is peculiar however that my quoted wordings were completely removed just a day after my issue here. But no clarification shined on this thread. But let's leave that aside and focus on the issue at hand.

May 14 '24 22:05 Sneakr

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_msg}<eot_id>

So i would not expect it to give good results. If you are getting better results with a null prompt, that's interesting - if you can share it, please DM me on twitter (same handle as github username).

May 15 '24 16:05 subramen

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

No no , you are correct, the better result is if it was trained with system headers and later inferenced with the system headers present too , regardless of null system message.

The second question I mean and the question is for the official Meta instruct model: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

Should the system headers be present or not, regardless of null system prompt?

May 15 '24 17:05 Sneakr

Just leaving this in here https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/sample_finetune.py

def apply_chat_template(
    example,
    tokenizer,
):
    messages = example["messages"]
    **# Add an empty system message if there is none**
    if messages[0]["role"] != "system":
        messages.insert(0, {"role": "system", "content": ""})
    example["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False)
    return example

Edit: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/commit/bbd531db4632bb631b0c44d98172894a0c594dd0 After lifting a different issue with PHI missing the system tokens in the tokenizer config they removed the system tokens in the fine tuning script due to not being supported by the model. However, this is not the case for Llama3 instruct, as the system token seems to be supported by the model.

May 21 '24 17:05 Sneakr

@subramen Not sure why this was marked as completed, the issue has not been resolved or answered at all.

Jun 09 '24 20:06 Sneakr

llama3 llama3 copied to clipboard

The absence or presence of a system token results in different outputs.

Describe the bug

llama3
llama3 copied to clipboard