Yu-won Lee

Results 230 comments of Yu-won Lee

Thanks I'm trying to add DPO and GRPO both of it. But I think I should study about how the DPO and GRPO code works in the trl library.

Thanks for the usefule page! I'll check the codes you gave me and the links to make one.

Sorry I'm late. I've updated the DPO and now its time for GRPO.

Sorry for the long waiting. I've added a code for training the model with GRPO. Feedbacks and PRs are always welcome :)

I can't exactly understand the question. Each data with a conversation is treated as a one input. So for a conversation with a single image will pass throgh foraward once...

It's maintained. The input is passed in a very long sequence if the conversation has lot of turns.

@R3xpook Not planned, but I'll try.

@R3xpook I've updated the code for DPO. Sorry it took so long.

I think the chat template has changed after I made this repo. I think it wouldn't affect the performance but, in case it could so, I've updated the chat teamplate...

Thanks for the report. I think removing the schedular part would make the scheduling work again.