Yu-won Lee comments

Results 230 comments of


                                            Yu-won Lee

Any pointers as to how I could integrate TRL GRPO into the mix?

Thanks I'm trying to add DPO and GRPO both of it. But I think I should study about how the DPO and GRPO code works in the trl library.

Any pointers as to how I could integrate TRL GRPO into the mix?

Thanks for the usefule page! I'll check the codes you gave me and the links to make one.

Any pointers as to how I could integrate TRL GRPO into the mix?

Sorry I'm late. I've updated the DPO and now its time for GRPO.

Any pointers as to how I could integrate TRL GRPO into the mix?

Sorry for the long waiting. I've added a code for training the model with GRPO. Feedbacks and PRs are always welcome :)

Noob question

I can't exactly understand the question. Each data with a conversation is treated as a one input. So for a conversation with a single image will pass throgh foraward once...

Noob question

It's maintained. The input is passed in a very long sequence if the conversation has lot of turns.

Noob question

@R3xpook Not planned, but I'll try.

Noob question

@R3xpook I've updated the code for DPO. Sorry it took so long.

A little discrepancy of chat templates (bug?)

I think the chat template has changed after I made this repo. I think it wouldn't affect the performance but, in case it could so, I've updated the chat teamplate...

A little discrepancy of chat templates (bug?)

Thanks for the report. I think removing the schedular part would make the scheduling work again.