unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

[FIXED] Llama 3 Finetuned model is not generating EOS token.

Open KillerShoaib opened this issue 1 year ago â€ĸ 7 comments
trafficstars

I've fine-tuned the llama 3 8 billion model. I followed the notebook and only changed the dataset. The dataset is similar to the alpaca dataset but for the Bangla language. I've trained the model for 1 epoch (36hrs) on a single T4 GPU. But, when I'm trying to generate a response it is not generating any eos token. It will go on till hitting the max_new_token length and stop.

Here is a sample of the code that is creating the dataset. (The same as the colab notebook. Just change the dataset name and system prompt)

code:

alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("iamshnoo/alpaca-cleaned-bengali", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

One single example of the dataset['text'] looks like this:

'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nāĻĒāĻĻāĻžāĻ°ā§āĻĨ⧇āϰ āĻĒāϰāĻŋāĻŦāĻ°ā§āϤāύāĻļā§€āϞāϤāĻž āĻāϰ āĻŦ⧈āĻœā§āĻžāĻžāύāĻŋāĻ• āϏāĻ‚āĻœā§āĻžāĻž āĻ•āĻŋ?\n\n### Input:\n\n\n### Response:\nāĻŦāĻŋāĻĒāĻžāĻ• āĻāĻ•āϟāĻŋ āĻœā§€āĻŦ⧇āϰ āĻŽāĻ§ā§āϝ⧇ āϘāĻŸā§‡ āϝāĻžāĻ“āϝāĻŧāĻž āϏāĻŽāĻ¸ā§āϤ āϜ⧈āĻŦ āϰāĻžāϏāĻžāϝāĻŧāύāĻŋāĻ• āĻŦāĻŋāĻ•ā§āϰāĻŋāϝāĻŧāĻžāϕ⧇ āĻŦā§‹āĻāĻžāϝāĻŧ, āϝāĻžāϰ āĻŽāĻ§ā§āϝ⧇ āĻāĻŽāύ āĻĒā§āϰāϤāĻŋāĻ•ā§āϰāĻŋāϝāĻŧāĻž āϰāϝāĻŧ⧇āϛ⧇ āϝāĻž āĻļāĻ•ā§āϤāĻŋ āωāĻ¤ā§āĻĒāĻžāĻĻāύ āĻ•āϰāϤ⧇ āĻ…āϪ⧁ āĻ­āĻžāĻ™ā§āĻ—āϤ⧇ āĻĒāĻžāϰ⧇ (āĻ•ā§āϝāĻžāϟāĻžāĻŦāϞāĻŋāϜāĻŽ) āĻāĻŦāĻ‚ āύāϤ⧁āύ āĻ…āϪ⧁ āϤ⧈āϰāĻŋ āĻ•āϰ⧇ (āĻ…ā§āϝāĻžāύāĻžāĻŦāϞāĻŋāϜāĻŽ) āĨ¤ āĻāχ āĻĒā§āϰāϤāĻŋāĻ•ā§āϰāĻŋāϝāĻŧāĻžāϗ⧁āϞāĻŋ āĻāύāϜāĻžāχāĻŽ āĻĻā§āĻŦāĻžāϰāĻž āϏāĻšāϜāϤāϰ āĻšāϝāĻŧ āĻāĻŦāĻ‚ āĻŦ⧃āĻĻā§āϧāĻŋ, āĻĒā§āϰāϜāύāύ āĻāĻŦāĻ‚ āĻĒāϰāĻŋāĻŦ⧇āĻļ⧇āϰ āĻĒā§āϰāϤāĻŋāĻ•ā§āϰāĻŋāϝāĻŧāĻž āĻšāĻŋāϏāĻžāĻŦ⧇ āĻĒā§āϰāϝāĻŧā§‹āϜāύ⧀āϝāĻŧ āĻĒā§āϰāĻ•ā§āϰāĻŋāϝāĻŧāĻžāϗ⧁āϞāĻŋāϰ āĻŽāĻžāĻ§ā§āϝāĻŽā§‡ āĻœā§€āĻŦāύ āĻŦāϜāĻžāϝāĻŧ āϰāĻžāĻ–āĻžāϰ āϜāĻ¨ā§āϝ āĻĒā§āϰāϝāĻŧā§‹āϜāύ⧀āϝāĻŧāĨ¤ āĻŦāĻŋāĻĒāĻžāĻ• āĻŦāĻŋāĻļ⧇āώāϤ āĻ–āĻžāĻĻā§āϝ⧇āϰ āĻ­āĻžāĻ™ā§āĻ—āύ āĻāĻŦāĻ‚ āĻāϟāĻŋ āĻļāĻ•ā§āϤāĻŋāϤ⧇ āϰ⧂āĻĒāĻžāĻ¨ā§āϤāϰāĻŋāϤ āĻšāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤<|end_of_text|>'

The EOS token has been added to the text in the end

Here is the generation code (same as the notebook):

# alpaca_prompt = Copied from above
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "āϏ⧁āĻ¸ā§āĻĨ āĻĨāĻžāĻ•āĻžāϰ āϤāĻŋāύāϟāĻŋ āωāĻĒāĻžāϝāĻŧ āĻŦāϞ⧁āύ", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)

Here is the response output :

['<|begin_of_text|>Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.\n\n### Instruction:\nāϏ⧁āĻ¸ā§āĻĨ āĻĨāĻžāĻ•āĻžāϰ āϤāĻŋāύāϟāĻŋ āωāĻĒāĻžāϝāĻŧ āĻŦāϞ⧁āύ\n\n### Input:\n\n\n### Response:\nā§§. āύāĻŋāϝāĻŧāĻŽāĻŋāϤ āĻŦā§āϝāĻžāϝāĻŧāĻžāĻŽ āĻ•āϰ⧁āύ: āύāĻŋāϝāĻŧāĻŽāĻŋāϤ āĻļāĻžāϰ⧀āϰāĻŋāĻ• āĻ•ā§āϰāĻŋāϝāĻŧāĻžāĻ•āϞāĻžāĻĒ āĻ•āϰāĻž āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ āĻāĻŦāĻ‚ āϏ⧁āĻ¸ā§āĻĨāϤāĻž āĻŦāϜāĻžāϝāĻŧ āϰāĻžāĻ–āϤ⧇ āϏāĻšāĻžāϝāĻŧāϤāĻž āĻ•āϰāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤ āĻāϟāĻŋ āĻšāĻžāĻ°ā§āϟ āϰ⧋āĻ—, āĻĄāĻžāϝāĻŧāĻžāĻŦ⧇āϟāĻŋāϏ āĻāĻŦāĻ‚ āĻ¸ā§āĻĨā§‚āϞāϤāĻžāϰ āĻŽāϤ⧋ āĻĻā§€āĻ°ā§āϘāĻ¸ā§āĻĨāĻžāϝāĻŧā§€ āϰ⧋āϗ⧇āϰ āĻā§āρāĻ•āĻŋ āĻšā§āϰāĻžāϏ āĻ•āϰāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤ ⧍. āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝāĻ•āϰ āĻ–āĻžāĻĻā§āϝ āĻ–āĻžāύāσ āĻāĻ•āϟāĻŋ āϏ⧁āώāĻŽ āĻāĻŦāĻ‚ āĻĒ⧁āĻˇā§āϟāĻŋāĻ•āϰ āĻĄāĻžāϝāĻŧ⧇āϟ āĻ–āĻžāĻ“āϝāĻŧāĻž āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ āĻāĻŦāĻ‚ āϏ⧁āĻ¸ā§āĻĨāϤāĻž āĻŦāϜāĻžāϝāĻŧ āϰāĻžāĻ–āϤ⧇ āϏāĻšāĻžāϝāĻŧāϤāĻž āĻ•āϰāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤ āĻĢāϞ, āϏāĻŦāϜāĻŋ, āĻĒā§‚āĻ°ā§āĻŖ āĻļāĻ¸ā§āϝ, āϚāĻ°ā§āĻŦāĻŋāϝ⧁āĻ•ā§āϤ āĻĒā§āϰ⧋āϟāĻŋāύ āĻāĻŦāĻ‚ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝāĻ•āϰ āĻĢā§āϝāĻžāϟ āϏāĻš āĻāĻ•āϟāĻŋ āĻ­āĻžāϰāϏāĻžāĻŽā§āϝāĻĒā§‚āĻ°ā§āĻŖ āĻĄāĻžāϝāĻŧ⧇āϟ āĻ–āĻžāĻ“āϝāĻŧāĻž āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰāϕ⧇ āϏāĻ āĻŋāĻ•āĻ­āĻžāĻŦ⧇ āĻ•āĻžāϜ āĻ•āϰāϤ⧇ āϏāĻšāĻžāϝāĻŧāϤāĻž āĻ•āϰāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤ ā§Š. āĻĒāĻ°ā§āϝāĻžāĻĒā§āϤ āϘ⧁āĻŽ āĻĒāĻžāύāσ āĻĒāĻ°ā§āϝāĻžāĻĒā§āϤ āϘ⧁āĻŽ āĻĒāĻžāĻ“āϝāĻŧāĻž āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ āĻāĻŦāĻ‚ āϏ⧁āĻ¸ā§āĻĨāϤāĻž āĻŦāϜāĻžāϝāĻŧ āϰāĻžāĻ–āϤ⧇ āϗ⧁āϰ⧁āĻ¤ā§āĻŦāĻĒā§‚āĻ°ā§āĻŖāĨ¤ āĻĒā§āϰāϤāĻŋ āϰāĻžāϤ⧇ āĻ•āĻŽāĻĒāĻ•ā§āώ⧇ 7-8 āϘāĻ¨ā§āϟāĻž āϘ⧁āĻŽ āĻĒāĻžāĻ“āϝāĻŧāĻž āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āϘ⧁āĻŽā§‡āϰ āĻ…āĻ­āĻžāĻŦ āφāĻĒāύāĻžāϰ āχāĻŽāĻŋāωāύ āϏāĻŋāĻ¸ā§āĻŸā§‡āĻŽāϕ⧇ āĻĻ⧁āĻ°ā§āĻŦāϞ āĻ•āϰāϤ⧇ āĻĒāĻžāϰ⧇, āϰ⧋āϗ⧇āϰ āĻā§āρāĻ•āĻŋ āĻŦāĻžāĻĄāĻŧāĻŋāϝāĻŧ⧇ āϤ⧁āϞāϤ⧇ āĻĒāĻžāϰ⧇ āĻāĻŦāĻ‚ āφāĻĒāύāĻžāϰ āĻŽāĻžāύāϏāĻŋāĻ• āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āωāĻĒāϰ āύ⧇āϤāĻŋāĻŦāĻžāϚāĻ• āĻĒā§āϰāĻ­āĻžāĻŦ āĻĢ⧇āϞāϤ⧇ āĻĒāĻžāϰ⧇āĨ¤ āϏ⧁āϤāϰāĻžāĻ‚ āĻĒāĻ°ā§āϝāĻžāĻĒā§āϤ āϘ⧁āĻŽ āĻĒāĻžāĻ“āϝāĻŧāĻž āφāĻĒāύāĻžāϰ āϏāĻžāĻŽāĻ—ā§āϰāĻŋāĻ• āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ āĻāĻŦāĻ‚ āϏ⧁āĻ¸ā§āĻĨāϤāĻž āĻŦāϜāĻžāϝāĻŧ āϰāĻžāĻ–āϤ⧇ āϗ⧁āϰ⧁āĻ¤ā§āĻŦāĻĒā§‚āĻ°ā§āĻŖāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝāĨ¤ āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀']

I asked the model in Bangla "Tell me 3 ways I can be healthy" and the model generated a coherent response. But after finishing the response it starts spamming "āĻāϟāĻŋ āφāĻĒāύāĻžāϰ āĻļāϰ⧀āϰ⧇āϰ āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝ⧇āϰ āϜāĻ¨ā§āϝ āĻ…āĻĒāϰāĻŋāĻšāĻžāĻ°ā§āϝ" (eng-translation: It is necessary for your body). And it goes till it hits the max_new_token length. I've tried different questions, but the result is always the same. I couldn't find a single time where the model generated the eos token.

The EOS token has been added to the data['text']. So in theory, If I fine-tune the model then it should learn to predict the EOS token. I've a total 51k samples and finetuned the model for 1 epoch.

One thing I've noticed is that in the original colab notebook, when the model was trained for 60 iterations and used to generate a response none of the responses generated EOS token.

KillerShoaib avatar May 03 '24 09:05 KillerShoaib

Did you consider using the llama3 chat template instead of the default one (check this notebook) ? Alternatively you could use tools like guidance which offers a lot of options to stop generation (for example regex or substrings). However, you will need to convert your model to llama.cpp to use with guidance. You loose unsloth's inference speed up but you can run on cpu.

zifken avatar May 03 '24 13:05 zifken

I encountered the same problem. I added EOS to the training data, but during prediction, the output always continues to the maximum number of tokens.

DDCY220 avatar May 04 '24 11:05 DDCY220

I'm facing the same problem here.

mxtsai avatar May 04 '24 13:05 mxtsai

I've figured out the solution. Below is the code for those who just want the solution, not the details:

Solution Code:

# change the padding tokenizer value
tokenizer.add_special_tokens({"pad_token": "<|reserved_special_token_0|>"})
model.config.pad_token_id = tokenizer.pad_token_id # updating model config
tokenizer.padding_side = 'right' # padding to right (otherwise SFTTrainer shows warning)

Now, pass the model and tokenizer to SFTTrainer.

Details of the solution

  1. I've come across a similar problem in this issue. It was for llama 2 model. A user pointed out that the pad_token_id & eos_token_id is the same. Therefore when the model is fine-tuning the loss function ignores both pad_token and eos_token. Thus, the model is not learning to predict the eos_token.
  2. Then I've checked the pad_token_id and eos_token_id for the unsloth-llama3. I found both the pad_token_id and the eos_token_id are the same.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128001 and Pad Token: <|end_of_text|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|> 
  1. Now that I've known that these 2 are the same. Well, all I've to do is change the pad_token_id. I've found this stack overflow question where it shows how to change the pad_token_id for falcon model.
  2. To change the pad_token_id you can not add any random value. It'll throw CUDA error. (I'm not sure but I'm assuming the reason for that error is the mismatch between tokenizer vocab size and model vocab size.)
  3. I've looked into the unsloth llama 3 model's tokenizer.json file and there are total 251 reserved special tokens. The values look like this <|reserved_special_token_0|> to <|reserved_special_token_250|>. You can use any of the reserved special token value as the pad_token value. I've used the first one <|reserved_special_token_0|>
  4. The code to change the value is written above.
  5. Now let's verify the pad_token_id and eos_token_id values.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128002 and Pad Token: <|reserved_special_token_0|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|>
  1. After that, I've trained my fine-tuned llama-3 model for just an extra 30 iterations with the newly changed pad_token_id. This time I ask the model same question as before and the model was able to generate eos_token and stopped before hitting the max_new_tokens length. Below I've shown 2 pictures showcasing the model's response for the same and different eos_token and pad_token.

pic1withoutEOS pic2withEos

I'm hopping UnslothAI is going to see this bug and solve it in their colab notebook. Lots of people are facing this issue .

KillerShoaib avatar May 05 '24 10:05 KillerShoaib

@KillerShoaib WHOOPS you are entirely correct!!!! I immediately updated all pad_tokens Unsloth has to <|reserved_special_token_250|> Thanks for the keen eye!!

danielhanchen avatar May 05 '24 12:05 danielhanchen

OMG Thank you for the solution here, was driving me nuts why llama3 was getting more rambling the more i trained it.

Nazzaroth2 avatar May 05 '24 14:05 Nazzaroth2

I suggest using <|end_of_text|> for pad token and <|eot_id|> for eos token.

tdolega avatar May 15 '24 00:05 tdolega

This issue still seems to be ongoing when using the default meta-llama models (e.g., meta-llama/Meta-Llama-3-8B-Instruct) following the Colab notebook example here: https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing

I can confirm the issue is not present however using the exact same code with the unsloth version of the model (e.g., unsloth/llama-3-8b-Instruct-bnb-4bit).

Long story short, it seems like there still some issue with eos for the stock llama 3 models.

davedgd avatar May 30 '24 15:05 davedgd

@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?

danielhanchen avatar May 30 '24 18:05 danielhanchen

@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?

Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.

With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have

PS. Thanks for the quick reply! :)

davedgd avatar May 30 '24 18:05 davedgd

@davedgd Oh that's a shame for Meta's official repo - well glad Unsloth works fine :)

danielhanchen avatar Jun 01 '24 10:06 danielhanchen

@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?

Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.

With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:

Hi, I am seeing some odd differences in unsloth/llama-3-8b-Instruct vs Meta-Llama-3-8B-Instruct (official hf one) with respect to the tokenizer and other .json files.

I guess I'm still confused ... Does anyone know why unsloth renamed (and apparently renumbered) the eos_token from 128001 to 128009 and
"eos_token": "<|end_of_text|>" to "eos_token": "<|eot_id|>" and then changed the pad_id from -1 to 128255? So is the unsloth update to the hf model for llama3 some kind of a bug fix for the original llama3 hf model?

*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/config.json	Thu May  9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/config.json	Sat Jun  1 13:49:31 2024
***************
*** 1,11 ****
  {
    "architectures": [
      "LlamaForCausalLM"
    ],
    "attention_bias": false,
    "attention_dropout": 0.0,
    "bos_token_id": 128000,
!   "eos_token_id": 128001,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "initializer_range": 0.02,
--- 1,12 ----
  {
+   "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
    "architectures": [
      "LlamaForCausalLM"
    ],
    "attention_bias": false,
    "attention_dropout": 0.0,
    "bos_token_id": 128000,
!   "eos_token_id": 128009,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "initializer_range": 0.02,
***************
*** 21,27 ****
    "rope_theta": 500000.0,
    "tie_word_embeddings": false,
    "torch_dtype": "bfloat16",
!   "transformers_version": "4.40.0.dev0",
    "use_cache": true,
    "vocab_size": 128256
  }
--- 22,28 ----
    "rope_theta": 500000.0,
    "tie_word_embeddings": false,
    "torch_dtype": "bfloat16",
!   "transformers_version": "4.38.2",
    "use_cache": true,
    "vocab_size": 128256
  }
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/generation_config.json	Thu May  9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/generation_config.json	Sat Jun  1 13:49:31 2024
***************
*** 1,9 ****
  {
    "bos_token_id": 128000,
    "eos_token_id": [128001, 128009],
!   "do_sample": true,
!   "temperature": 0.6,
!   "max_length": 4096,
!   "top_p": 0.9,
!   "transformers_version": "4.40.0.dev0"
  }
--- 1,6 ----
  {
+   "_from_model_config": true,
    "bos_token_id": 128000,
    "eos_token_id": [128001, 128009],
!   "transformers_version": "4.38.2"
  }
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/special_tokens_map.json	Thu May  9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/special_tokens_map.json	Sat Jun  1 13:49:31 2024
***************
*** 1,4 ****
  {
!   "bos_token": "<|begin_of_text|>",
!   "eos_token": "<|end_of_text|>"
  }
--- 1,23 ----
  {
!   "bos_token": {
!     "content": "<|begin_of_text|>",
!     "lstrip": false,
!     "normalized": false,
!     "rstrip": false,
!     "single_word": false
!   },
!   "eos_token": {
!     "content": "<|eot_id|>",
!     "lstrip": false,
!     "normalized": false,
!     "rstrip": false,
!     "single_word": false
!   },
!   "pad_token": {
!     "content": "<|reserved_special_token_250|>",
!     "lstrip": false,
!     "normalized": false,
!     "rstrip": false,
!     "single_word": false
!   }
  }
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/tokenizer_config.json	Thu May  9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/tokenizer_config.json	Sat Jun  1 13:49:31 2024
***************
*** 2052,2062 ****
    "bos_token": "<|begin_of_text|>",
    "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
    "clean_up_tokenization_spaces": true,
!   "eos_token": "<|end_of_text|>",
    "model_input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_max_length": 1000000000000000019884624838656,
    "tokenizer_class": "PreTrainedTokenizerFast"
  }
--- 2052,2064 ----
    "bos_token": "<|begin_of_text|>",
    "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
    "clean_up_tokenization_spaces": true,
!   "eos_token": "<|eot_id|>",
    "model_input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_max_length": 1000000000000000019884624838656,
+   "pad_token": "<|reserved_special_token_250|>",
+   "padding_side": "left",
    "tokenizer_class": "PreTrainedTokenizerFast"
  }

devzzzero avatar Jun 01 '24 19:06 devzzzero

See https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/4d6c61da057c45bfc4dc4d3bfa5a691ecb9ce0cf

Yes the pad token is in fact a bug fix

danielhanchen avatar Jun 02 '24 16:06 danielhanchen

See https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/4d6c61da057c45bfc4dc4d3bfa5a691ecb9ce0cf

Yes the pad token is in fact a bug fix

Indeed. My pull of the official Llama3 hf models occurred more than 20 days ago :-) Thank you.

devzzzero avatar Jun 02 '24 16:06 devzzzero

Oh cool! Ye it got updated

danielhanchen avatar Jun 04 '24 15:06 danielhanchen

@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?

Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.

With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have significantly reduced over time

Vehicle emissions have

PS. Thanks for the quick reply! :)

Same issue while i am testing unsloth llama3.1