Thewillman comments

Results 4 comments of


                                            Thewillman

Non-decreasing loss for custom dataset

> I couldn't solve it, still have this problem. Have you solved this question? I use supconloss for my dataset for batchsize=128 and loss don't decrease

Has anyone face with problems that DPO rewards accuracy stuck at 0.5 and the loss stuck at 0.6 to 0.8?

The rewards accuracies just float around 0.5, which means the chosen rewards in some steps can smaller than the rejected rewards

Has anyone face with problems that DPO rewards accuracy stuck at 0.5 and the loss stuck at 0.6 to 0.8?

> Sorry, but despite my best efforts I can't understand your question. You're talking about similar prompts in a list, about modifying the codebase without providing us with your modifications,...

Has anyone face with problems that DPO rewards accuracy stuck at 0.5 and the loss stuck at 0.6 to 0.8?

> > Sorry, but despite my best efforts I can't understand your question. You're talking about similar prompts in a list, about modifying the codebase without providing us with your...