unsloth
unsloth copied to clipboard
GRPOTrainer example works with trl but generate "noise" with unsloth
Hi, I'm running a simple example of GRPOTrainer in plain trl and it runs fine (using the very same conda env I use for unsloth):
After MANY iterations the text becomes garbage but I think it is reasonable given the reward function used.
I tried to port this to unsloth, it runs, but the model generates "noise" after the very first fine tuning iteration:
First completion is fine:
reward_function completions: I got blamed, and the girl is in the same classes, for what i didn't do.
the following ones are "noise":
reward_function completions: back.Peeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee reward_function completions: .Pee.Pee est.Peeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Environment details and full log:
This might be related https://github.com/unslothai/unsloth/issues/1836 but I'm already using 3.11.11
Also: https://github.com/unslothai/unsloth/issues/1672 tried 2025.2.12 but it's still the same.
I also tried unsloth/llama-3-8b-bnb-4bit with same results.
What am I doing wrong?
Thanks
I encountered the same issue as you did. I checked all the installation versions on the official Colab and ensured that they were consistent, but the problem still persisted. Eventually, I set vllm_cache=True and found that the model could run normally and generate proper sequences. To be more specific, the settings are as follows:
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_path,
max_seq_length=max_seq_length,
load_in_4bit=True,
fast_inference=True, # set True if you want vLLM fast inference
max_lora_rank=lora_rank,
gpu_memory_utilization=0.7
)
training_args_lyrics = GRPOConfig(
use_vllm = True, # use vLLM for fast inference!
learning_rate = 5e-6,
adam_beta1 = 0.9,
adam_beta2 = 0.99,
weight_decay = 0.1,
warmup_ratio = 0.1,
lr_scheduler_type = "cosine",
optim = "paged_adamw_8bit",
logging_steps = 1,
bf16 = is_bfloat16_supported(),
fp16 = not is_bfloat16_supported(),
per_device_train_batch_size = 4,
gradient_accumulation_steps = 2, # Increase to 4 for smoother training
num_generations = 8, # Decrease if out of memory
max_prompt_length = 768,
max_completion_length = 768,
num_train_epochs = 2, # Set to 1 for a full training run
# max_steps = 50,
save_steps = 250,
max_grad_norm = 0.1,
report_to = "none", # Can use Weights & Biases
output_dir = "outputs_lyrics_phase",
)
With these settings, the program runs smoothly. It seems that the current models only support vllm-based gradient backpropagation. Without enabling vllm_cache, the first batch of data might be normal, but subsequent batches often encounter repetitive issues. However, once vllm_cache is turned on, the aforementioned problems are resolved!
I encountered the same issue as you did. I checked all the installation versions on the official Colab and ensured that they were consistent, but the problem still persisted. Eventually, I set
vllm_cache=Trueand found that the model could run normally and generate proper sequences. To be more specific, the settings are as follows: . . . With these settings, the program runs smoothly. It seems that the current models only support vllm-based gradient backpropagation. Without enablingvllm_cache, the first batch of data might be normal, but subsequent batches often encounter repetitive issues. However, oncevllm_cacheis turned on, the aforementioned problems are resolved!
@StarLight1212, I cannot see vllm_cache=True anywhere in your snippet.
@StarLight1212 Thank you very much for your detailed answer :)
I added fast_inference, enforce_eager, gpu_memory_utilization here:
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
#model_name = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
fast_inference=True,
enforce_eager=True,
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
gpu_memory_utilization=0.7
)
and use_vllm = True in the GRPOConfig
Now it works with the Llama 3B and 8B!
Note:
pip install vllm
downgraded several core packages but everything works fine.
The same problem, which is not resolved yet.
The same problem, which is not resolved yet.
I found the way to resolve it. You can just downgrade your unsloth to 2025.2.12 and do not use vllm. You can also refer to #1810
@nottrz @DiTo97 @Summer142857 @Summer142857 Apologies just fixed! For Colab / Kaggle, please restart and run all. For local machines, please do:
pip install --force-reinstall --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo
If this issue still has not been resolved feel free to make a new issue but ill be closing for now!
I encountered the same issue as you did. I checked all the installation versions on the official Colab and ensured that they were consistent, but the problem still persisted. Eventually, I set
vllm_cache=Trueand found that the model could run normally and generate proper sequences. To be more specific, the settings are as follows:model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_path, max_seq_length=max_seq_length, load_in_4bit=True, fast_inference=True, # set True if you want vLLM fast inference max_lora_rank=lora_rank, gpu_memory_utilization=0.7 ) training_args_lyrics = GRPOConfig( use_vllm = True, # use vLLM for fast inference! learning_rate = 5e-6, adam_beta1 = 0.9, adam_beta2 = 0.99, weight_decay = 0.1, warmup_ratio = 0.1, lr_scheduler_type = "cosine", optim = "paged_adamw_8bit", logging_steps = 1, bf16 = is_bfloat16_supported(), fp16 = not is_bfloat16_supported(), per_device_train_batch_size = 4, gradient_accumulation_steps = 2, # Increase to 4 for smoother training num_generations = 8, # Decrease if out of memory max_prompt_length = 768, max_completion_length = 768, num_train_epochs = 2, # Set to 1 for a full training run # max_steps = 50, save_steps = 250, max_grad_norm = 0.1, report_to = "none", # Can use Weights & Biases output_dir = "outputs_lyrics_phase", )With these settings, the program runs smoothly. It seems that the current models only support vllm-based gradient backpropagation. Without enabling
vllm_cache, the first batch of data might be normal, but subsequent batches often encounter repetitive issues. However, oncevllm_cacheis turned on, the aforementioned problems are resolved!
How do you turn on vllm_cache? It's not in your code snippet