unsloth
unsloth copied to clipboard
[FIXED] Llama 3 Finetuned model is not generating EOS token.
I've fine-tuned the llama 3 8 billion model. I followed the notebook and only changed the dataset. The dataset is similar to the alpaca dataset but for the Bangla language. I've trained the model for 1 epoch (36hrs) on a single T4 GPU. But, when I'm trying to generate a response it is not generating any eos token. It will go on till hitting the max_new_token length and stop.
Here is a sample of the code that is creating the dataset. (The same as the colab notebook. Just change the dataset name and system prompt)
code:
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = examples["instruction"]
inputs = examples["input"]
outputs = examples["output"]
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts, }
pass
from datasets import load_dataset
dataset = load_dataset("iamshnoo/alpaca-cleaned-bengali", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
One single example of the dataset['text'] looks like this:
'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nāĻĒāĻĻāĻžāϰā§āĻĨā§āϰ āĻĒāϰāĻŋāĻŦāϰā§āϤāύāĻļā§āϞāϤāĻž āĻāϰ āĻŦā§āĻā§āĻāĻžāύāĻŋāĻ āϏāĻāĻā§āĻāĻž āĻāĻŋ?\n\n### Input:\n\n\n### Response:\nāĻŦāĻŋāĻĒāĻžāĻ āĻāĻāĻāĻŋ āĻā§āĻŦā§āϰ āĻŽāϧā§āϝ⧠āĻāĻā§ āϝāĻžāĻāϝāĻŧāĻž āϏāĻŽāϏā§āϤ āĻā§āĻŦ āϰāĻžāϏāĻžāϝāĻŧāύāĻŋāĻ āĻŦāĻŋāĻā§āϰāĻŋāϝāĻŧāĻžāĻā§ āĻŦā§āĻāĻžāϝāĻŧ, āϝāĻžāϰ āĻŽāϧā§āϝ⧠āĻāĻŽāύ āĻĒā§āϰāϤāĻŋāĻā§āϰāĻŋāϝāĻŧāĻž āϰāϝāĻŧā§āĻā§ āϝāĻž āĻļāĻā§āϤāĻŋ āĻāϤā§āĻĒāĻžāĻĻāύ āĻāϰāϤ⧠āĻ
āĻŖā§ āĻāĻžāĻā§āĻāϤ⧠āĻĒāĻžāϰ⧠(āĻā§āϝāĻžāĻāĻžāĻŦāϞāĻŋāĻāĻŽ) āĻāĻŦāĻ āύāϤā§āύ āĻ
āĻŖā§ āϤā§āϰāĻŋ āĻāϰ⧠(āĻ
ā§āϝāĻžāύāĻžāĻŦāϞāĻŋāĻāĻŽ) āĨ¤ āĻāĻ āĻĒā§āϰāϤāĻŋāĻā§āϰāĻŋāϝāĻŧāĻžāĻā§āϞāĻŋ āĻāύāĻāĻžāĻāĻŽ āĻĻā§āĻŦāĻžāϰāĻž āϏāĻšāĻāϤāϰ āĻšāϝāĻŧ āĻāĻŦāĻ āĻŦā§āĻĻā§āϧāĻŋ, āĻĒā§āϰāĻāύāύ āĻāĻŦāĻ āĻĒāϰāĻŋāĻŦā§āĻļā§āϰ āĻĒā§āϰāϤāĻŋāĻā§āϰāĻŋāϝāĻŧāĻž āĻšāĻŋāϏāĻžāĻŦā§ āĻĒā§āϰāϝāĻŧā§āĻāύā§āϝāĻŧ āĻĒā§āϰāĻā§āϰāĻŋāϝāĻŧāĻžāĻā§āϞāĻŋāϰ āĻŽāĻžāϧā§āϝāĻŽā§ āĻā§āĻŦāύ āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāĻžāϰ āĻāύā§āϝ āĻĒā§āϰāϝāĻŧā§āĻāύā§āϝāĻŧāĨ¤ āĻŦāĻŋāĻĒāĻžāĻ āĻŦāĻŋāĻļā§āώāϤ āĻāĻžāĻĻā§āϝā§āϰ āĻāĻžāĻā§āĻāύ āĻāĻŦāĻ āĻāĻāĻŋ āĻļāĻā§āϤāĻŋāϤ⧠āϰā§āĻĒāĻžāύā§āϤāϰāĻŋāϤ āĻšāϤ⧠āĻĒāĻžāϰā§āĨ¤<|end_of_text|>'
The EOS token has been added to the text in the end
Here is the generation code (same as the notebook):
# alpaca_prompt = Copied from above
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"āϏā§āϏā§āĻĨ āĻĨāĻžāĻāĻžāϰ āϤāĻŋāύāĻāĻŋ āĻāĻĒāĻžāϝāĻŧ āĻŦāϞā§āύ", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)
Here is the response output :
['<|begin_of_text|>Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.\n\n### Instruction:\nāϏā§āϏā§āĻĨ āĻĨāĻžāĻāĻžāϰ āϤāĻŋāύāĻāĻŋ āĻāĻĒāĻžāϝāĻŧ āĻŦāϞā§āύ\n\n### Input:\n\n\n### Response:\nā§§. āύāĻŋāϝāĻŧāĻŽāĻŋāϤ āĻŦā§āϝāĻžāϝāĻŧāĻžāĻŽ āĻāϰā§āύ: āύāĻŋāϝāĻŧāĻŽāĻŋāϤ āĻļāĻžāϰā§āϰāĻŋāĻ āĻā§āϰāĻŋāϝāĻŧāĻžāĻāϞāĻžāĻĒ āĻāϰāĻž āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝ āĻāĻŦāĻ āϏā§āϏā§āĻĨāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāϤ⧠āϏāĻšāĻžāϝāĻŧāϤāĻž āĻāϰāϤ⧠āĻĒāĻžāϰā§āĨ¤ āĻāĻāĻŋ āĻšāĻžāϰā§āĻ āϰā§āĻ, āĻĄāĻžāϝāĻŧāĻžāĻŦā§āĻāĻŋāϏ āĻāĻŦāĻ āϏā§āĻĨā§āϞāϤāĻžāϰ āĻŽāϤ⧠āĻĻā§āϰā§āĻāϏā§āĻĨāĻžāϝāĻŧā§ āϰā§āĻā§āϰ āĻā§āĻāĻāĻŋ āĻšā§āϰāĻžāϏ āĻāϰāϤ⧠āĻĒāĻžāϰā§āĨ¤ ⧍. āϏā§āĻŦāĻžāϏā§āĻĨā§āϝāĻāϰ āĻāĻžāĻĻā§āϝ āĻāĻžāύāĻ āĻāĻāĻāĻŋ āϏā§āώāĻŽ āĻāĻŦāĻ āĻĒā§āώā§āĻāĻŋāĻāϰ āĻĄāĻžāϝāĻŧā§āĻ āĻāĻžāĻāϝāĻŧāĻž āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝ āĻāĻŦāĻ āϏā§āϏā§āĻĨāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāϤ⧠āϏāĻšāĻžāϝāĻŧāϤāĻž āĻāϰāϤ⧠āĻĒāĻžāϰā§āĨ¤ āĻĢāϞ, āϏāĻŦāĻāĻŋ, āĻĒā§āϰā§āĻŖ āĻļāϏā§āϝ, āĻāϰā§āĻŦāĻŋāϝā§āĻā§āϤ āĻĒā§āϰā§āĻāĻŋāύ āĻāĻŦāĻ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝāĻāϰ āĻĢā§āϝāĻžāĻ āϏāĻš āĻāĻāĻāĻŋ āĻāĻžāϰāϏāĻžāĻŽā§āϝāĻĒā§āϰā§āĻŖ āĻĄāĻžāϝāĻŧā§āĻ āĻāĻžāĻāϝāĻŧāĻž āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰāĻā§ āϏāĻ āĻŋāĻāĻāĻžāĻŦā§ āĻāĻžāĻ āĻāϰāϤ⧠āϏāĻšāĻžāϝāĻŧāϤāĻž āĻāϰāϤ⧠āĻĒāĻžāϰā§āĨ¤ ā§Š. āĻĒāϰā§āϝāĻžāĻĒā§āϤ āĻā§āĻŽ āĻĒāĻžāύāĻ āĻĒāϰā§āϝāĻžāĻĒā§āϤ āĻā§āĻŽ āĻĒāĻžāĻāϝāĻŧāĻž āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝ āĻāĻŦāĻ āϏā§āϏā§āĻĨāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāϤ⧠āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖāĨ¤ āĻĒā§āϰāϤāĻŋ āϰāĻžāϤ⧠āĻāĻŽāĻĒāĻā§āώ⧠7-8 āĻāύā§āĻāĻž āĻā§āĻŽ āĻĒāĻžāĻāϝāĻŧāĻž āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻā§āĻŽā§āϰ āĻ
āĻāĻžāĻŦ āĻāĻĒāύāĻžāϰ āĻāĻŽāĻŋāĻāύ āϏāĻŋāϏā§āĻā§āĻŽāĻā§ āĻĻā§āϰā§āĻŦāϞ āĻāϰāϤ⧠āĻĒāĻžāϰā§, āϰā§āĻā§āϰ āĻā§āĻāĻāĻŋ āĻŦāĻžāĻĄāĻŧāĻŋāϝāĻŧā§ āϤā§āϞāϤ⧠āĻĒāĻžāϰ⧠āĻāĻŦāĻ āĻāĻĒāύāĻžāϰ āĻŽāĻžāύāϏāĻŋāĻ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāĻĒāϰ āύā§āϤāĻŋāĻŦāĻžāĻāĻ āĻĒā§āϰāĻāĻžāĻŦ āĻĢā§āϞāϤ⧠āĻĒāĻžāϰā§āĨ¤ āϏā§āϤāϰāĻžāĻ āĻĒāϰā§āϝāĻžāĻĒā§āϤ āĻā§āĻŽ āĻĒāĻžāĻāϝāĻŧāĻž āĻāĻĒāύāĻžāϰ āϏāĻžāĻŽāĻā§āϰāĻŋāĻ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝ āĻāĻŦāĻ āϏā§āϏā§āĻĨāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāϤ⧠āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝāĨ¤ āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§']
I asked the model in Bangla "Tell me 3 ways I can be healthy" and the model generated a coherent response. But after finishing the response it starts spamming "āĻāĻāĻŋ āĻāĻĒāύāĻžāϰ āĻļāϰā§āϰā§āϰ āϏā§āĻŦāĻžāϏā§āĻĨā§āϝā§āϰ āĻāύā§āϝ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝ" (eng-translation: It is necessary for your body). And it goes till it hits the max_new_token length. I've tried different questions, but the result is always the same. I couldn't find a single time where the model generated the eos token.
The EOS token has been added to the data['text']. So in theory, If I fine-tune the model then it should learn to predict the EOS token. I've a total 51k samples and finetuned the model for 1 epoch.
One thing I've noticed is that in the original colab notebook, when the model was trained for 60 iterations and used to generate a response none of the responses generated EOS token.
Did you consider using the llama3 chat template instead of the default one (check this notebook) ?
Alternatively you could use tools like guidance which offers a lot of options to stop generation (for example regex or substrings). However, you will need to convert your model to llama.cpp to use with guidance. You loose unsloth's inference speed up but you can run on cpu.
I encountered the same problem. I added EOS to the training data, but during prediction, the output always continues to the maximum number of tokens.
I'm facing the same problem here.
I've figured out the solution. Below is the code for those who just want the solution, not the details:
Solution Code:
# change the padding tokenizer value
tokenizer.add_special_tokens({"pad_token": "<|reserved_special_token_0|>"})
model.config.pad_token_id = tokenizer.pad_token_id # updating model config
tokenizer.padding_side = 'right' # padding to right (otherwise SFTTrainer shows warning)
Now, pass the model and tokenizer to SFTTrainer.
Details of the solution
- I've come across a similar problem in this issue. It was for llama 2 model. A user pointed out that the
pad_token_id&eos_token_idis the same. Therefore when the model is fine-tuning the loss function ignores bothpad_tokenandeos_token. Thus, the model is not learning to predict theeos_token. - Then I've checked the
pad_token_idandeos_token_idfor the unsloth-llama3. I found both thepad_token_idand theeos_token_idare the same.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128001 and Pad Token: <|end_of_text|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|>
- Now that I've known that these 2 are the same. Well, all I've to do is change the
pad_token_id. I've found this stack overflow question where it shows how to change thepad_token_idfor falcon model. - To change the
pad_token_idyou can not add any random value. It'll throwCUDAerror. (I'm not sure but I'm assuming the reason for that error is the mismatch between tokenizer vocab size and model vocab size.) - I've looked into the unsloth llama 3 model's tokenizer.json file and there are total 251 reserved special tokens. The values look like this
<|reserved_special_token_0|>to<|reserved_special_token_250|>. You can use any of the reserved special token value as thepad_tokenvalue. I've used the first one<|reserved_special_token_0|> - The code to change the value is written above.
- Now let's verify the
pad_token_idandeos_token_idvalues.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128002 and Pad Token: <|reserved_special_token_0|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|>
- After that, I've trained my fine-tuned llama-3 model for just an extra 30 iterations with the newly changed
pad_token_id. This time I ask the model same question as before and the model was able to generateeos_tokenand stopped before hitting themax_new_tokenslength. Below I've shown 2 pictures showcasing the model's response for the same and differenteos_tokenandpad_token.
I'm hopping UnslothAI is going to see this bug and solve it in their colab notebook. Lots of people are facing this issue .
@KillerShoaib WHOOPS you are entirely correct!!!! I immediately updated all pad_tokens Unsloth has to <|reserved_special_token_250|> Thanks for the keen eye!!
OMG Thank you for the solution here, was driving me nuts why llama3 was getting more rambling the more i trained it.
I suggest using <|end_of_text|> for pad token and <|eot_id|> for eos token.
This issue still seems to be ongoing when using the default meta-llama models (e.g., meta-llama/Meta-Llama-3-8B-Instruct) following the Colab notebook example here: https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing
I can confirm the issue is not present however using the exact same code with the unsloth version of the model (e.g., unsloth/llama-3-8b-Instruct-bnb-4bit).
Long story short, it seems like there still some issue with eos for the stock llama 3 models.
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have
PS. Thanks for the quick reply! :)
@davedgd Oh that's a shame for Meta's official repo - well glad Unsloth works fine :)
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Hi, I am seeing some odd differences in unsloth/llama-3-8b-Instruct vs Meta-Llama-3-8B-Instruct (official hf one) with respect to the tokenizer and other .json files.
I guess I'm still confused ...
Does anyone know why unsloth renamed (and apparently renumbered) the eos_token from 128001 to 128009 and
"eos_token": "<|end_of_text|>" to "eos_token": "<|eot_id|>" and then changed the pad_id from -1 to 128255?
So is the unsloth update to the hf model for llama3 some kind of a bug fix for the original llama3 hf model?
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/config.json Thu May 9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/config.json Sat Jun 1 13:49:31 2024
***************
*** 1,11 ****
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
! "eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
--- 1,12 ----
{
+ "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
! "eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
***************
*** 21,27 ****
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
! "transformers_version": "4.40.0.dev0",
"use_cache": true,
"vocab_size": 128256
}
--- 22,28 ----
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
! "transformers_version": "4.38.2",
"use_cache": true,
"vocab_size": 128256
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/generation_config.json Thu May 9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/generation_config.json Sat Jun 1 13:49:31 2024
***************
*** 1,9 ****
{
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
! "do_sample": true,
! "temperature": 0.6,
! "max_length": 4096,
! "top_p": 0.9,
! "transformers_version": "4.40.0.dev0"
}
--- 1,6 ----
{
+ "_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
! "transformers_version": "4.38.2"
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/special_tokens_map.json Thu May 9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/special_tokens_map.json Sat Jun 1 13:49:31 2024
***************
*** 1,4 ****
{
! "bos_token": "<|begin_of_text|>",
! "eos_token": "<|end_of_text|>"
}
--- 1,23 ----
{
! "bos_token": {
! "content": "<|begin_of_text|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! },
! "eos_token": {
! "content": "<|eot_id|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! },
! "pad_token": {
! "content": "<|reserved_special_token_250|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! }
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/tokenizer_config.json Thu May 9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/tokenizer_config.json Sat Jun 1 13:49:31 2024
***************
*** 2052,2062 ****
"bos_token": "<|begin_of_text|>",
"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
"clean_up_tokenization_spaces": true,
! "eos_token": "<|end_of_text|>",
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
"tokenizer_class": "PreTrainedTokenizerFast"
}
--- 2052,2064 ----
"bos_token": "<|begin_of_text|>",
"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
"clean_up_tokenization_spaces": true,
! "eos_token": "<|eot_id|>",
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
+ "pad_token": "<|reserved_special_token_250|>",
+ "padding_side": "left",
"tokenizer_class": "PreTrainedTokenizerFast"
}
See https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/4d6c61da057c45bfc4dc4d3bfa5a691ecb9ce0cf
Yes the pad token is in fact a bug fix
See https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/4d6c61da057c45bfc4dc4d3bfa5a691ecb9ce0cf
Yes the pad token is in fact a bug fix
Indeed. My pull of the official Llama3 hf models occurred more than 20 days ago :-) Thank you.
Oh cool! Ye it got updated
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions havePS. Thanks for the quick reply! :)
Same issue while i am testing unsloth llama3.1