imitation
imitation copied to clipboard
Clean PyTorch implementations of imitation and reward learning algorithms
The original [deep RL from human preferences paper](https://arxiv.org/pdf/1706.03741.pdf) uses an ensemble of reward models. It then selects queries for comparison that have the highest disagreement between models, a proxy for...
At the moment, all of our test environments have tabular observation spaces. It would be nice to include an example with a more complex observation space, like stacked images in...
There are some differences between stable-baselines' `VecNormilize` and imitation's `RunningNorm/NormilizedRewardFunction` that might cause performance regressions. The VecNormilization in normalizes based on an estimate of rewards so far in the episode....
Hi I have another question about airl_trainer.train. There is a parameter in it is "callback", I want to evaluate the model periodically and saving the best one, like in stable...
This PR adds some features necessary to get clean support for image-based environments: - Switched to a (temporary) fork of SB3 that removes transpose magic. This makes it simpler for...
## Background - Previously, [`exploration_frac`](https://github.com/HumanCompatibleAI/imitation/blob/3d7a76b8c587a25e380aeb09f65b764d7693aeea/src/imitation/algorithms/preference_comparisons.py#L212) was implemented to add exploratory trajectories for preference comparisons. Its aim was to add diversity to the dataset in order to escape from local minimum...
`rollout.PolicyCallable` takes an observation and outputs an action. This only supports stateless policies. By contrast, `BasePolicy.predict` takes an observation, mask (is it terminal observation) and state (which is reset when...
The expert demonstrations are referred in the code base as: - episodes - trajectories - lists of transitions - rollouts - demonstrations I think we should clarify in the documentation...
I still have a mild concern about the increase in test suite runtime, but this is something we can address in another PR if it proves sufficiently annoying. _Originally posted...
Hi, I am currently experimenting with a couple of imitation learning algorithms. I have recently found [this](https://arxiv.org/abs/2106.12142) paper and I was planning on giving a shot implementing it. Do you...