LLM-with-RL-papers
LLM-with-RL-papers copied to clipboard
add rrhf
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
RRHF: Rank Responses to Align Language Models with Human Feedback without tears