unsloth
unsloth copied to clipboard
Qwen2.5 (3B) - GRPO
I trained a model with the Qwen2.5 (3B) - GRPO notebook provided. I got this response on inference with saved lora.
But when I push the model to HF hub and use it for inference I get this response
There are two 'r's in the word "strawberry".
The reasoning format is not showing. Please suggest. I can share my training and inference notebooks.