unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Bug when load model for GRPO Training without PEFT

Open VanderpoelLiam opened this issue 1 year ago • 2 comments

The issue #1632 is no longer fixed. If I run the Qwen2.5_(3B)-GRPO.ipynb and comment out

# model = FastLanguageModel.get_peft_model(
#     model,
#     r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
#     target_modules = [
#         "q_proj", "k_proj", "v_proj", "o_proj",
#         "gate_proj", "up_proj", "down_proj",
#     ], # Remove QKVO if out of memory
#     lora_alpha = lora_rank,
#     use_gradient_checkpointing = "unsloth", # Enable long context finetuning
#     random_state = 3407,
# )

then you get the LLMEngine should not be pickled! error when you train:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-7-f0c0f43b49a3>](https://localhost:8080/#) in <cell line: 0>()
----> 1 trainer = GRPOTrainer(
      2     model = model,
      3     processing_class = tokenizer,
      4     reward_funcs = [
      5         xmlcount_reward_func,

13 frames

[/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py](https://localhost:8080/#) in __reduce__(self)
    500         # This is to ensure that the LLMEngine is not referenced in
    501         # the closure used to initialize Ray worker actors
--> 502         raise RuntimeError("LLMEngine should not be pickled!")
    503 
    504     def __del__(self):

RuntimeError: LLMEngine should not be pickled!

I am unsure what the fix is, if you point me in the right direction I am happy to investigate.

VanderpoelLiam avatar Mar 04 '25 14:03 VanderpoelLiam

@Erland366 Could you check if vLLM works still if no LoRA adapters are added? I think you also had a PR on moving load_lora outside of get_peft_model

danielhanchen avatar Mar 06 '25 10:03 danielhanchen

Sorry for the very late reply. I finally able to get back into Unsloth stuff

I don't think you can train non-LoRA model using Unsloth in general. When I tested for inference, yes it works.

Erland366 avatar Mar 09 '25 12:03 Erland366

@Erland366 Is there any plan to add support for training? If not I think you should make it more clear that unsloth only works in combination with LoRA adaptors.

VanderpoelLiam avatar Mar 12 '25 10:03 VanderpoelLiam

Sorry for the very late reply. I finally able to get back into Unsloth stuff

I don't think you can train non-LoRA model using Unsloth in general. When I tested for inference, yes it works.

Is that indeed so? Only lora model with Unsloth?

li-aolong avatar Mar 14 '25 15:03 li-aolong

请问后续怎么样了,你能用unsloth训练非lora模型吗

qwerty3564 avatar Mar 20 '25 13:03 qwerty3564

请问后续怎么样了,你能用unsloth训练非lora模型吗

@qwerty3564 好像不支持非lora啊

cht619 avatar Apr 22 '25 08:04 cht619