Support simple, synchronous, CLI or Jupyter human preference collection

Open timbauman opened this issue 2 years ago • 0 comments

Problem

Today only synthetic preferences are supported. It would be great to support real human preferences.

Requirements:

record videos of trajectories
ideally, extensible so we could factor out into a more elaborate, asynchronous service someday
simple enough that it can be built in a day

MVP:

Steps:

[ ] Inject option to store videos into training code (looks like I can just use VecEnv?)
[ ] Build new interface/class for storing videos (so we could configure this to store in different directories or on the cloud, for example)
[ ] Build new gatherer that requests user feedback
[ ] Display videos to users in Jupyter or CLI
[ ] Clean up watched/unneeded videos
[ ] Build demo notebook
[ ] Integrate into training script and test end to end

Slightly more than MVP (out of scope for this issue):

Refactor to support asynchronous preference gathering
Separate requesting preferences from receiving them
Periodically retrain with new preferences
Would be nice to have a way to indicate that new preferences are available
Would require changing fragmenter possibly? E.g. "just show the most recent pair to the user" rather than all pairs
Build asynchronous

May 09 '23 23:05 timbauman