imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Algorithm wishlist (meta-issue)

Open qxcv opened this issue 5 years ago • 2 comments

Below are some algorithms that it would be nice to see in imitation, but which aren't urgently needed. Feel free to extend this list.

Learning from demonstrations:

Binary trajectory comparison/trajectory rankings:

  • [x] DRLHP. Adam has a basic implementation without human interaction, Matthew has a more full-fledged (but I'd guess less reliable?) version.
  • [ ] T-REX (going to be hard because it requires manually ranked trajectories rather than a single set of demonstrations of similar quality) and D-REX (easier, only requires normal, similar-quality trajectory demonstrations as input & can infer rankings from BC with noise).

Learning from quantitative feedback:

qxcv avatar Dec 09 '19 23:12 qxcv

I've added DRLHP. In general I'd like to not focus just on learning from demonstrations, given this has some severe limitations in terms of reward ambiguity.

AdamGleave avatar Dec 10 '19 00:12 AdamGleave

We now have a DRLHP implementation thanks to https://github.com/HumanCompatibleAI/imitation/pull/320

Would welcome PRs on the others, too, but will be prioritizing making the existing algorithms solid and easy to use and adding any algorithms that are needed for our own immediate use cases.

AdamGleave avatar Sep 04 '21 00:09 AdamGleave