imitation
imitation copied to clipboard
Algorithm wishlist (meta-issue)
Below are some algorithms that it would be nice to see in imitation, but which aren't urgently needed. Feel free to extend this list.
Learning from demonstrations:
- [ ] IQ-Learn
- [ ] SQIL
- [ ] InfoGAIL (should be straightforward extension of GAIL)
- [ ] PEMIRL (i.e. meta-AIRL)
Binary trajectory comparison/trajectory rankings:
- [x] DRLHP. Adam has a basic implementation without human interaction, Matthew has a more full-fledged (but I'd guess less reliable?) version.
- [ ] T-REX (going to be hard because it requires manually ranked trajectories rather than a single set of demonstrations of similar quality) and D-REX (easier, only requires normal, similar-quality trajectory demonstrations as input & can infer rankings from BC with noise).
Learning from quantitative feedback:
- [ ] Deep TAMER
- [ ] COACH
- [ ] Reward sketching
I've added DRLHP. In general I'd like to not focus just on learning from demonstrations, given this has some severe limitations in terms of reward ambiguity.
We now have a DRLHP implementation thanks to https://github.com/HumanCompatibleAI/imitation/pull/320
Would welcome PRs on the others, too, but will be prioritizing making the existing algorithms solid and easy to use and adding any algorithms that are needed for our own immediate use cases.