rlhf-book

rlhf-book copied to clipboard

Reame
Issues

Chapter Plans

Open natolambert opened this issue 1 year ago • 0 comments

Here is a rough outline of what I would like to see in the book, and who will be writing it.

Introductions & History

Introduction
Economics, Psychology, Philosophy of preference, etc.: VNM Theory, Bradley Terry, Impossibility theorems, social choice, etc
Optimal Control, Deep RL, ML etc.
RLHF for LLM lit (pre chatgpt stuff), maybe summarize instrugpt

Links:

https://arxiv.org/abs/2310.13595

Problem Specification

Definitions, basic stuff, math
Preference data collection
Preference model training
KL constraints and other penalties

Policy Optimization

IFT / SFT / Chat Templates
Rejection Sampling / Best of N
PPO, REINFORCE, Policy Gradient
DPO (Eric, Archit, Rafael)
Other variants (short)

Advanced (optional)

CAI
Synthetic vs human data
Evaluation

Open Questions (TBD / optional)

Reward model over-optimization

Jun 01 '24 15:06 natolambert