PaLM-rlhf-pytorch icon indicating copy to clipboard operation
PaLM-rlhf-pytorch copied to clipboard

The loss function of reward model.

Open huzechuan opened this issue 2 years ago • 2 comments

Hi, I am confused that the loss function of ChatGPT's reward model takes as input the difference of two responses and then passes a sigmoid function. However, the loss function in this repo only takes one response as input and uses the ranking score as a label to calculate the CE loss. Is there an advantage to this?

huzechuan avatar Jan 31 '23 12:01 huzechuan

@huzechuan i have to admit i haven't totally digested the way they derive their reward values for training

but at the moment, even if their reward is derived from a collection of sampled responses, this repository doesn't lock you into any one method, as you can do your second step (training the reward model) from any <sequence, reward value> pair, which you define

i guess i'll have to worry about this once i build out the application for sampling from some version of the model and collecting the ratings, so do let me know in detail the optimal way they discovered. i just think there are other applications beyond text that this could be used for (rl, protein design), that does not necessarily need this sigmoid of difference approach

lucidrains avatar Jan 31 '23 17:01 lucidrains

Hi, I am confused that the loss function of ChatGPT's reward model takes as input the difference of two responses and then passes a sigmoid function. However, the loss function in this repo only takes one response as input and uses the ranking score as a label to calculate the CE loss. Is there an advantage to this?

I have the same confusion

yangjianxin1 avatar Feb 12 '23 09:02 yangjianxin1