PaLM-rlhf-pytorch issues

Results 23 PaLM-rlhf-pytorch issues

Sort by recently updated

Value function

Hi, I am confused about the 'value function' in the instructGPT paper. In the paper, it said "As previously mentioned, for all PPO models we use a 6B RM and...

tonylin52

The loss function of reward model.

Hi, I am confused that the loss function of ChatGPT's reward model takes as input the difference of two responses and then passes a sigmoid function. However, the loss function...

huzechuan

A bug in the implementation of the top-p sampling

https://github.com/lucidrains/PaLM-rlhf-pytorch/blob/6b02ee329106baff78e293afa7d1d2e6dd4e5ca2/palm_rlhf_pytorch/utils.py#L60 Using the sorted indices to index the sorted indices does not make sense. I think it may be `return logits.scatter(1, sorted_indices, sorted_logits)`

allblueJT

PaLM-rlhf-pytorch
PaLM-rlhf-pytorch copied to clipboard

Metadata

Value function

The loss function of reward model.

A bug in the implementation of the top-p sampling

← Metadata

Owner

Metadata

PaLM-rlhf-pytorch PaLM-rlhf-pytorch copied to clipboard

Metadata

Value function

The loss function of reward model.

A bug in the implementation of the top-p sampling

← Metadata

Owner

Metadata

PaLM-rlhf-pytorch
PaLM-rlhf-pytorch copied to clipboard