Yu-won Lee

Results 230 comments of Yu-won Lee

https://github.com/2U1/Qwen2-VL-Finetune/issues/154#issuecomment-3002261699 This could be the answer for the question.

Maybe it could be a problem with 1. reward score, 2. Non EOS generating 3. Deteministic sampling You could add the debug script ``` @profiling_decorator def compute_loss(self, model, inputs, return_outputs=False,...

Your LoRA layers are correctly attached (trainable = 392) but advantage is always zero because the two completions in each group receive the same reward. Increase sampling diversity (temperature, top_p,...

ㅆhe warning itself does not change the behaviour of your run – your CLI flags (--top_k 50 --top_p 1.0 …) still override the defaults when the Trainer builds its own...

I've made some update for the generation config. I think it wasn't properly applied. You could just copy & paste the grpo_trainer code and retry.

I think you should check the completions that models are making. If the completions are the same then the generation config isn't working. If the they are similar, then the...

Currently, I'm using the default logic of trl for GRPO so, it dosen't support videos for now. I'll make an update for that.

I've updated the code support videos in grpo. Also you should now add `` token in the dataset for video training with grpo.

Well it dosen't support the featrue for that, but you could just merge the json file for doing it. If there is some reason that you can't merge the json...