imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Support simple, synchronous, CLI or Jupyter human preference collection

Open timbauman opened this issue 2 years ago • 0 comments

Problem

Today only synthetic preferences are supported. It would be great to support real human preferences.

Solution

Requirements:

  • record videos of trajectories
  • ideally, extensible so we could factor out into a more elaborate, asynchronous service someday
  • simple enough that it can be built in a day

MVP:

  • In a Jupyter notebook or CLI
  • Synchronous

Steps:

  • [ ] Inject option to store videos into training code (looks like I can just use VecEnv?)
  • [ ] Build new interface/class for storing videos (so we could configure this to store in different directories or on the cloud, for example)
  • [ ] Build new gatherer that requests user feedback
  • [ ] Display videos to users in Jupyter or CLI
  • [ ] Clean up watched/unneeded videos
  • [ ] Build demo notebook
  • [ ] Integrate into training script and test end to end

Possible alternative solutions

Slightly more than MVP (out of scope for this issue):

  • Refactor to support asynchronous preference gathering
  • Separate requesting preferences from receiving them
  • Periodically retrain with new preferences
  • Would be nice to have a way to indicate that new preferences are available
  • Would require changing fragmenter possibly? E.g. "just show the most recent pair to the user" rather than all pairs
  • Build asynchronous

timbauman avatar May 09 '23 23:05 timbauman