rlhf-book icon indicating copy to clipboard operation
rlhf-book copied to clipboard

Chapter Plans

Open natolambert opened this issue 1 year ago • 0 comments

Here is a rough outline of what I would like to see in the book, and who will be writing it.

Introductions & History

  1. Introduction
  2. Economics, Psychology, Philosophy of preference, etc.: VNM Theory, Bradley Terry, Impossibility theorems, social choice, etc
  3. Optimal Control, Deep RL, ML etc.
  4. RLHF for LLM lit (pre chatgpt stuff), maybe summarize instrugpt

Links:

  • https://arxiv.org/abs/2310.13595

Problem Specification

  1. Definitions, basic stuff, math
  2. Preference data collection
  3. Preference model training
  4. KL constraints and other penalties

Policy Optimization

  1. IFT / SFT / Chat Templates
  2. Rejection Sampling / Best of N
  3. PPO, REINFORCE, Policy Gradient
  4. DPO (Eric, Archit, Rafael)
  5. Other variants (short)

Advanced (optional)

  1. CAI
  2. Synthetic vs human data
  3. Evaluation

Open Questions (TBD / optional)

  1. Reward model over-optimization

natolambert avatar Jun 01 '24 15:06 natolambert