Yu-won Lee comments

Results 230 comments of


                                            Yu-won Lee

GRPO training for caption prediction evaluated using BELU score

https://github.com/2U1/Qwen2-VL-Finetune/issues/154#issuecomment-3002261699 This could be the answer for the question.

GRPO training for caption prediction evaluated using BELU score

Maybe it could be a problem with 1. reward score, 2. Non EOS generating 3. Deteministic sampling You could add the debug script ``` @profiling_decorator def compute_loss(self, model, inputs, return_outputs=False,...

GRPO training for caption prediction evaluated using BELU score

Your LoRA layers are correctly attached (trainable = 392) but advantage is always zero because the two completions in each group receive the same reward. Increase sampling diversity (temperature, top_p,...

GRPO training for caption prediction evaluated using BELU score

ㅆhe warning itself does not change the behaviour of your run – your CLI flags (--top_k 50 --top_p 1.0 …) still override the defaults when the Trainer builds its own...

GRPO training for caption prediction evaluated using BELU score

I've made some update for the generation config. I think it wasn't properly applied. You could just copy & paste the grpo_trainer code and retry.

GRPO training for caption prediction evaluated using BELU score

I think you should check the completions that models are making. If the completions are the same then the generation config isn't working. If the they are similar, then the...

Can finetune_grpo input video content?

Currently, I'm using the default logic of trl for GRPO so, it dosen't support videos for now. I'll make an update for that.

Can finetune_grpo input video content?

I've updated the code support videos in grpo. Also you should now add `` token in the dataset for video training with grpo.

Error using vllm with Qwen2.5-VL-72B-Instruct-AWQ

Finetuning with Multiple jsons

Well it dosen't support the featrue for that, but you could just merge the json file for doing it. If there is some reason that you can't merge the json...