imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Clean PyTorch implementations of imitation and reward learning algorithms

Results 146 imitation issues
Sort by recently updated
recently updated
newest added

The original [deep RL from human preferences paper](https://arxiv.org/pdf/1706.03741.pdf) uses an ensemble of reward models. It then selects queries for comparison that have the highest disagreement between models, a proxy for...

At the moment, all of our test environments have tabular observation spaces. It would be nice to include an example with a more complex observation space, like stacked images in...

enhancement

There are some differences between stable-baselines' `VecNormilize` and imitation's `RunningNorm/NormilizedRewardFunction` that might cause performance regressions. The VecNormilization in normalizes based on an estimate of rewards so far in the episode....

Hi I have another question about airl_trainer.train. There is a parameter in it is "callback", I want to evaluate the model periodically and saving the best one, like in stable...

This PR adds some features necessary to get clean support for image-based environments: - Switched to a (temporary) fork of SB3 that removes transpose magic. This makes it simpler for...

## Background - Previously, [`exploration_frac`](https://github.com/HumanCompatibleAI/imitation/blob/3d7a76b8c587a25e380aeb09f65b764d7693aeea/src/imitation/algorithms/preference_comparisons.py#L212) was implemented to add exploratory trajectories for preference comparisons. Its aim was to add diversity to the dataset in order to escape from local minimum...

`rollout.PolicyCallable` takes an observation and outputs an action. This only supports stateless policies. By contrast, `BasePolicy.predict` takes an observation, mask (is it terminal observation) and state (which is reset when...

enhancement

The expert demonstrations are referred in the code base as: - episodes - trajectories - lists of transitions - rollouts - demonstrations I think we should clarify in the documentation...

I still have a mild concern about the increase in test suite runtime, but this is something we can address in another PR if it proves sufficiently annoying. _Originally posted...

Hi, I am currently experimenting with a couple of imitation learning algorithms. I have recently found [this](https://arxiv.org/abs/2106.12142) paper and I was planning on giving a shot implementing it. Do you...