Daniel Han

Results 1103 comments of Daniel Han
trafficstars

@tongyx361 Apologies on the delay - ye the new transformers update broke saving - so you need overwrite the old tokenizer file up redownloading them

@katopz @srsugandh Can you guys ask this on our Discord - probably a better place to get this resolved

GRPO leverages the system prompt from Qwen itself. So it's better to use: ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("___unsloth_qwen_model__") tokenizer.apply_chat_chat([ {"role" : "system", "content" : SYSTEM_PROMPT}, {"role" :...

So the system prompt is important yes, but yes the number of steps is way too less - you probably need 500 to 2000

Just fixed - apologies on the issue! For local machines, please do: ``` pip install "unsloth>=2025.3.8" "unsloth_zoo>=2025.3.7" --upgrade --force-reinstall --no-deps ``` For Colab / Kaggle machines, please disconnect and restart...

Oh GRPO experiments are fine! These bugs are more related to Unsloth internals, and will not affect training runs (ie how I optimize files etc)

@Cgrandjean Follow your original script, and try lowering `gpu_memory_utilization` to 0.5 or 0.4. I'm working on reducing VRAM consumption which will come in a few days!