imitation
imitation copied to clipboard
Clean PyTorch implementations of imitation and reward learning algorithms
## Description Support for stable_baselines3 style callbacks in adversarial training. This feature was partly addressed in #626, but the author seems to have lost interest there ## Testing Tests in...
## Description This PR updates the adversarial algorithm by training the discriminator between collecting the rollouts of the generator and training the generator. This matches the reference implementation provided in...
## Problem Robotic Env such as [SurRoL]( https://github.com/med-air/SurRoL) and [Fetch](https://robotics.farama.org/envs/fetch/) uses Dictionary Observation Space, with 1. observation 2. desired_goal 3. achieved_goal as keys. ## Query Is there a quick fix...
I am testing 'imitation' to work with proprietary environments for the robot in our Mujoco-based lab. For testing I am generating Pygame environments based on Gym. I am creating user...
## Problem Due to [this validation](https://github.com/HumanCompatibleAI/imitation/blob/5c85ebf02a591dad171946710d80617cfcca108e/src/imitation/data/types.py#L131) environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy. This seems a bit overzealous....
## Problem Today only synthetic preferences are supported. It would be great to support real human preferences. ## Solution Requirements: - record videos of trajectories - ideally, extensible so we...
## Description See #711 ## Testing TODO: add notebook and experiment config that use this feature, and screenshots of behavior. (I've tested myself but not in a clean way.)
## Description Here are two scripts I used for checking for type errors in documentation and notebooks. I don't know whether this is of use to anyone. So I figured...
Right now we use Sacred to run experiments/algorithms. This PR is about exploring whether Hydra would be a good option for running experiments and constructing/configuring the CLI interface of `imitation`....
I wouldn't mind seeing something discussing whether these trajectory objects can only be used for imitation algorithms or can also be used for stable baselines3 or offline RL algorithms, and...