imitation issues

[Preference Comparison] Active learning from ensemble

The original [deep RL from human preferences paper](https://arxiv.org/pdf/1706.03741.pdf) uses an ensemble of reward models. It then selects queries for comparison that have the highest disagreement between models, a proxy for...

AdamGleave

Image-based IL example

5

At the moment, all of our test environments have tabular observation spaces. It would be nice to include an example with a more complex observation space, like stacked images in...

qxcv

enhancement

Differences between stable-baselines3 VecNormilize and RunningNorm

There are some differences between stable-baselines' `VecNormilize` and imitation's `RunningNorm/NormilizedRewardFunction` that might cause performance regressions. The VecNormilization in normalizes based on an estimate of rewards so far in the episode....

levmckinney

[Question]Periodically callback

Hi I have another question about airl_trainer.train. There is a parameter in it is "callback", I want to evaluate the model periodically and saving the best one, like in stable...

strangeppeo

Changes to support image-based environments

5

This PR adds some features necessary to get clean support for image-based environments: - Switched to a (temporary) fork of SB3 that removes transpose magic. This makes it simpler for...

qxcv

Randomness control for different `exploration_frac` in preference comparisons

5

## Background - Previously, [`exploration_frac`](https://github.com/HumanCompatibleAI/imitation/blob/3d7a76b8c587a25e380aeb09f65b764d7693aeea/src/imitation/algorithms/preference_comparisons.py#L212) was implemented to add exploratory trajectories for preference comparisons. Its aim was to add diversity to the dataset in order to escape from local minimum...

yawen-d

Add support for stateful policies

`rollout.PolicyCallable` takes an observation and outputs an action. This only supports stateless policies. By contrast, `BasePolicy.predict` takes an observation, mask (is it terminal observation) and state (which is reset when...

AdamGleave

enhancement

Inconsistent naming of expert demonstrations

4

The expert demonstrations are referred in the code base as: - episodes - trajectories - lists of transitions - rollouts - demonstrations I think we should clarify in the documentation...

ernestum

Test suite speed improvements

I still have a mild concern about the increase in test suite runtime, but this is something we can address in another PR if it proves sufficiently annoying. _Originally posted...

ernestum

Inverse Q-Learning (IQ-Learn) implementation

3

Hi, I am currently experimenting with a couple of imitation learning algorithms. I have recently found [this](https://arxiv.org/abs/2106.12142) paper and I was planning on giving a shot implementing it. Do you...

hhroberthdaniel

imitation
imitation copied to clipboard

Metadata

[Preference Comparison] Active learning from ensemble

Image-based IL example

Differences between stable-baselines3 VecNormilize and RunningNorm

[Question]Periodically callback

Changes to support image-based environments

Randomness control for different `exploration_frac` in preference comparisons

Add support for stateful policies

Inconsistent naming of expert demonstrations

Test suite speed improvements

Inverse Q-Learning (IQ-Learn) implementation

← Metadata

Owner

Metadata

imitation imitation copied to clipboard

Metadata

← Metadata

Owner

Metadata

imitation
imitation copied to clipboard