Daniel Han comments

Results 1103 comments of


                                            Daniel Han

trafficstars

Bug when load model for GRPO Training without PEFT

@Erland366 Could you check if vLLM works still if no LoRA adapters are added? I think you also had a PR on moving `load_lora` outside of `get_peft_model`

Introduce MsT technologies into unsloth to extend sequence length

@wdlctc Thanks a lot again!! I'll test it and verify all losses match! Appreciate it!

Introduce MsT technologies into unsloth to extend sequence length

Sorry on the delay - was planning to add this together with Vision support :) It might take a few more days!

Introduce MsT technologies into unsloth to extend sequence length

Oh lol I noticed I accidentally deleted this PR after I deleted the nightly branch - whoops so sorry!

Introduce MsT technologies into unsloth to extend sequence length

Interesting so I looked through the paper and code, essentially you're proposing to essentially do gradient accumulation inside of each sequence length? Ie the first is normally chunking the CE...

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

@Erland366 Could you confirm if my latest fix allows multi GPU to work OK? Thanks. I think it's CCE related but unsure

During the GRPO training process, I didn't see any scoring from the reward model. Is it necessary to have a reward model?

Do you mean the logging that I provided? Or as in you want to use a reward model? GRPO in general doesn't use a reward model - it calculates advantages...

Daniel Han

Bug when load model for GRPO Training without PEFT

Introduce MsT technologies into unsloth to extend sequence length

Introduce MsT technologies into unsloth to extend sequence length

Introduce MsT technologies into unsloth to extend sequence length

Introduce MsT technologies into unsloth to extend sequence length

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

During the GRPO training process, I didn't see any scoring from the reward model. Is it necessary to have a reward model?

Llama 3.2 vision support

Llama 3.2 vision support

GRPOTrainer example works with trl but generate "noise" with unsloth