unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Qwen2.5 (3B) - GRPO

Open sudhir2016 opened this issue 3 days ago • 0 comments

I trained a model with the Qwen2.5 (3B) - GRPO notebook provided. I got this response on inference with saved lora.

To determine the number of r's in the word "strawberry," I will count each occurrence of the letter r in the word. The word "strawberry" contains the letter r twice: 1. In the beginning, "r". 2. In the middle, "rr". 2

But when I push the model to HF hub and use it for inference I get this response

There are two 'r's in the word "strawberry".

The reasoning format is not showing. Please suggest. I can share my training and inference notebooks.

sudhir2016 avatar Feb 21 '25 18:02 sudhir2016