Thewillman
Thewillman
> I couldn't solve it, still have this problem. Have you solved this question? I use supconloss for my dataset for batchsize=128 and loss don't decrease
The rewards accuracies just float around 0.5, which means the chosen rewards in some steps can smaller than the rejected rewards
> Sorry, but despite my best efforts I can't understand your question. You're talking about similar prompts in a list, about modifying the codebase without providing us with your modifications,...
> > Sorry, but despite my best efforts I can't understand your question. You're talking about similar prompts in a list, about modifying the codebase without providing us with your...