rl
rl copied to clipboard
[Feature Request] multi-turn reward for RLHF
Implement rewards as proposed in https://arxiv.org/pdf/2405.14655
I am very interested in multi-turn RLHF, can you give a sample code
@vmoens I am interested in this. there any progress. I am ready to collaborate or start from scratch.