ymcki

Results 49 comments of ymcki

https://github.com/unslothai/unsloth/issues/1810 Someone says we need to downgrade to 2025.2.12 to fix the vllm problem.

Thanks for the reply. I am trying to fine tune gemma-2-2b but it got this vllm error: ``` Traceback (most recent call last): File "/home/user/anaconda3/lib/python3.12/site-packages/peft/peft_model.py", line 824, in __getattr__ return...

> > It should work now! Please update Unsloth via `pip install --upgrade --force-reinstall --no-cache-dir unsloth_zoo unsloth` then before your script, do: > > from unsloth import FastLanguageModel, PatchFastRL >...

> Thanks for the reply. I am trying to fine tune gemma-2-2b but it got this vllm error: > > > But after I get rid of "use_vllm=True", it seems...

> Oh also Gemma doesn't work yet - I'll make it work later today! Currently only Llama, Mistral, Qwen, Phi type architectures work What do you mean by "gemma doesn't...

I can confirm that vllm works for llama-3.2-3b. It is expected to finish one epoch in 50hrs. In contrast, gemma-2-2b without vllm is expected to finish in 175hrs.

> > A higher `num_generations` means more attempts at producing a good solution to a given sample. > > Yes, GRPO can only "work" if at least one good completion...

Does anyone getting positive rewards/soft_format_reward_func using the example code? Mine always stuck at zero despite rewards/strict_format_reward_func getting positive values. Isn't rewards/soft_format_reward_func less strict than rewards/strict_format_reward_func such that it should be...

Anyone observed self-correction behavior while training? I found one case near the end of the second epoch while training Llama-3.1-8B. However, it self corrected the correct answer into a wrong...

I too have this error. Didn't expect it to be a bug so new. Maybe Youtube just changed their stuff?