PaLM-rlhf-pytorch icon indicating copy to clipboard operation
PaLM-rlhf-pytorch copied to clipboard

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Results 23 PaLM-rlhf-pytorch issues
Sort by recently updated
recently updated
newest added

Hi, I am confused about the 'value function' in the instructGPT paper. In the paper, it said "As previously mentioned, for all PPO models we use a 6B RM and...

Hi, I am confused that the loss function of ChatGPT's reward model takes as input the difference of two responses and then passes a sigmoid function. However, the loss function...

https://github.com/lucidrains/PaLM-rlhf-pytorch/blob/6b02ee329106baff78e293afa7d1d2e6dd4e5ca2/palm_rlhf_pytorch/utils.py#L60 Using the sorted indices to index the sorted indices does not make sense. I think it may be `return logits.scatter(1, sorted_indices, sorted_logits)`